|
| 1 | +--- |
| 2 | +layout: post |
| 3 | +title: "August 2016 Genomic Standards Committee Meeting Notes" |
| 4 | +modified: |
| 5 | +categories: blog |
| 6 | +excerpt: |
| 7 | +tags: [] |
| 8 | +image: |
| 9 | + feature: |
| 10 | +date: 2016-08-01T20:43:38-05:00 |
| 11 | +--- |
| 12 | + |
| 13 | +# |
| 14 | +# TERRA Ref Genomics Standards Committee Meeting |
| 15 | + |
| 16 | +## **Participants** |
| 17 | + |
| 18 | +David LeBauer, Christine Laney, Michael Gore, Carolyn Lawrence-Dill, Eric Lyons, Noah Fahlgren |
| 19 | + |
| 20 | +REGRETS: |
| 21 | +Todd Mockler, Max Burnette, David Lee, Geoff Morris, Craig Willis |
| 22 | + |
| 23 | +## **Agenda** |
| 24 | + |
| 25 | +Introductions |
| 26 | + |
| 27 | +Objective: review current status of pipeline and plans for first data release in November. |
| 28 | + |
| 29 | +Overview (Noah) |
| 30 | + |
| 31 | +Sequencing |
| 32 | + |
| 33 | +- what has been done |
| 34 | + - 192 resequenced genomes (~20-30x coverage each) from Steve K. bioenergy assoc. panel (BAP) |
| 35 | + - 192 additional samples sent to HudsonAlpha one week ago (20-30x) |
| 36 | + - External funding |
| 37 | + - Illumina for additional ~1000 sequences |
| 38 | + - DOE CSP for de novo |
| 39 | + - Data quality control and analysis to date done on the Danforth Center cluster |
| 40 | + - Trimmomatic => bwa => GATK => CNVator |
| 41 | + - By November: user will upload raw sequencing data and metadata to TERRAref pipeline using CoGe (below) |
| 42 | +- what is in pipeline |
| 43 | + - Raw data and experimental metadata added to Clowder |
| 44 | + - Clowder extractor |
| 45 | + - Upload data to the CyVerse data store (TERRA-REF) |
| 46 | + - Launch CoGe workflow using the API |
| 47 | + - Synchronize results back to Clowder/BETYdb |
| 48 | + |
| 49 | +- Clowder: a database that can hold data of any format. Data being imported to clowder will automatically trip extractor that will move data to the correct location for discovery and analysis |
| 50 | +- Data will be uploaded to NCBI, SRA |
| 51 | + - Can we link from the SRA to CyVerse and Clowder easily and robustly? |
| 52 | + |
| 53 | +CoGe pipeline |
| 54 | + |
| 55 | +- A sample analysis: [https://genomevolution.org/coge/NotebookView.pl?nid=1344](https://genomevolution.org/coge/NotebookView.pl?nid=1344) |
| 56 | +- Draft implementation: [https://github.com/terraref/computing-pipeline/blob/f94a87f851b37ff74ded5b7b6b3b0c1e13107720/scripts/coge/coge\_upload.json](https://github.com/terraref/computing-pipeline/blob/f94a87f851b37ff74ded5b7b6b3b0c1e13107720/scripts/coge/coge_upload.json) |
| 57 | + |
| 58 | +Downstream Analyses |
| 59 | + |
| 60 | +- GOBII |
| 61 | +- Other downstream tools? |
| 62 | + - SNP callling via CoGe |
| 63 | + - What is already within CoGe |
| 64 | + - Putting proprietary GATK on CyVerse (Mike G will send more info) |
| 65 | + |
| 66 | +Data Sharing |
| 67 | + |
| 68 | +- when, where, and with what will we share as of November |
| 69 | +- Currently using CyVerse data store ( [https://de.iplantcollaborative.org/de/](https://de.iplantcollaborative.org/de/)) |
| 70 | + - [terraref/reference-data/19](https://github.com/terraref/reference-data/issues/19) |
| 71 | +- Phytozome (a DOE database)- is this an appropriate for our data? Perhaps not for raw reads (Mike G) |
| 72 | + - Maybe we can submit variation information from the CoGe pipeline and update it as the reference genome is updated |
| 73 | + - Is Phytozome interested in hosting a pangenome resources? |
| 74 | +- NCBI SRA: raw data + experimental metadata |
| 75 | + - NEON has worked with SRA on data/metadata sharing, keep in touch with them |
| 76 | +- Others? |
| 77 | + |
| 78 | +Other questions / ideas |
| 79 | + |
| 80 | +- How to get from genbank to related |
| 81 | + |
| 82 | +NEON: providing metagenomic data, processed and made available to the public w/ mgrast; marker gene sequences will be hosted in SRA / not available w/in NEON portal but available from external repository. Genomic standard meeting next week, working on environmental soil meta-data package for Mixs [http://gensc.org/mixs/submit-mixs-metadata/](http://gensc.org/mixs/submit-mixs-metadata/) |
| 83 | + |
| 84 | +NEON has started using EML to begin documenting sensor and observational data (currently online at [http://data.neonscience.org](http://data.neonscience.org) but not pretty). May begin doing this w/ soil samples. |
| 85 | + |
| 86 | +Action items: |
| 87 | + |
| 88 | +### **References** |
| 89 | + |
| 90 | +- Genomics pipeline documentaiton [https://github.com/terraref/documentation/blob/master/genomics\_pipeline.md](https://github.com/terraref/documentation/blob/master/genomics_pipeline.md) |
| 91 | +- Genomics data formats: [terraref/reference-data/19](https://github.com/terraref/reference-data/issues/19) |
| 92 | +- Pipeline implementation: [terraref/computing-pipeline/issues/37](https://github.com/terraref/computing-pipeline/issues/37) |
| 93 | +- Using CoGe [terraref/computing-pipeline/issues/41](https://github.com/terraref/computing-pipeline/issues/41) |
0 commit comments