Bioinformatics Training and Education Program

may, 2021

03may11:00 am12:00 pmCloud scale biomedical data warehousing with Google Bigquery


Summary: Storing and querying massive datasets can be time-consuming and expensive without the right tools. Google BigQuery is one of several enterprise data warehouse technologies that solves this problem by enabling SQL queries using the processing power distributed cloud infrastructure. Arbitrarily large structured and semi-structured datasets (think tables and JSON files) can be loaded into BigQuery and then queried and analyzed in real-time regardless of size. Data in BigQuery can also be shared, reused, and even joined to open public datasets. In this operational talk, I will give an overview of BigQuery technology and the niche it fills, show some examples of using BigQuery, and give a concise catalog of biologically interesting datasets that are publicly available in BigQuery. Attendees should leave with an understanding of what BigQuery is, how it might be useful to their work, and how to gain access to the technology and data resources described.


Dr. Sean Davis


