https://ncihub.org/resources/2404 Abstract: In many applications in genomics, large data sets are created and lightly used before being shared with other researchers (ideally) or simply tossed away on hard drives.
In many applications in genomics, large data sets are created and lightly used before being shared with other researchers (ideally) or simply tossed away on hard drives. The Cancer Cloud project has enabled some of this very large data to be shared among qualified researchers in order to facilitate a greater understanding of oncogenesis. One issue that continuously comes up, however, is that simply using the data requires specialized skills outside of the biological realm. A blend of computer science and biology is required in order to properly be able to access and appropriately run computations on data as it gets too big to scale. This presentation goes over an application on the ISB Cancer Cloud where whole genome sequencing was used to generate variant calls for downstream research. Due to the size of the whole genome sequences, this was cost prohibitive to do it on lab computers and had to be done in the cloud. Also due to the size of the data, custom processes needed to be put into place to manage and queue the computations as well as to parallelize and reconstruct them properly. This workflow has been made available open source for adaptation to other pipelines and the WGS variant data is being made available to qualified researchers in the cancer cloud.
Dr. John Torcivia, Director of AI Deployment, Clarifai, Inc.
Department of Biochemistry, George Washington University
Abstracts, Slides and Recordings from past meetings can be found here: https://ncihub.org/groups/cwig (New Link!)
For questions and subscription, please contact , Durga Addepalli at firstname.lastname@example.org
(Friday) 3:00 pm - 4:00 pm