May 2020 updates

Phylogenomics pilot
1. 133 samples – head to head test of Angiosperms353 baits and Waycott baits
2. 8 samples – combination Angiosperms353 + Waycott baits 
3. Data generation will be completed at the end of May.

Phylogenomics EOI: Stage 1 – Australian Angiosperm Tree of Life (AAToL)
1. Collectively, the six accepted EoI’s include participants from all major Australian herbaria (AD, BRI, CANB, CNS, DNA, HO, MEL, PERTH, NSW, UNE), rendering this a truly national project.
2. We are now ready to commence sampling for the project. So far 1354 samples have been committed by participants.

Phylogenomics bioinformatics (prepared by Alexander Schmidt-Lebuhn)
1. Dealing with paralogy: automated pipeline used by Alexander and Todd, currently being tested on another dataset.
2. Exploration of NextFlow to containerise bioinformatics scripts, making them easier to install on various platforms.
3. Development of Draft Recommendations document for data analysis in phylogenomics projects. It should be ready for circulation to other WGs after next WG meeting. Scope: what methods and software to use, what intermediate and final datasets to make available to community, etc. Aim is to
a. Find agreement on preferred analysis of data for any consortium level publications, and
b. Provide recommendations to individual consortium members less familiar with the various options, while helping to identify potential training needs.
4. Exploration of how crucial intermediate files, in particular assembly results including all paralogs, can be made available to the community.
5. Discussion on what metadata should be available for each sample.

Reference genome – Acacia (prepared by Anna Syme)
1. Sequencing data obtained from nanopore (long reads), Illumina (short reads) and 10x (linked reads).  
2. Nuclear genome assembly almost complete: various assemblers have been used followed by polishing and scaffolding. The resulting contigs are now being assessed for length, correctness, and other metrics.
3. Preparing for genome annotation by comparing annotation tools, particularly regarding options to mask repeats such as transposable elements.
4. Lab work in progress to obtain additional RNA-seq data to improve genome annotation. 
5. Organelle assembly in progress for chloroplast and mitochondrial genomes

Reference genome – Waratah (prepared by Jason Bragg)
1. Previous plant chosen was lost during the bushfires. Currently keeping an eye out for individuals that can be sampled for Hi-C, and that would be appropriate for propagation, and a voucher specimen.
2. A Chromium 10x assembly is complete, best N50 approximately 1.3 MB.
3. MinION long reads assembly update:
Flye finished, N50 ~2.5 MB
NECAT finished, N50, 10.7 MB
BUSCO at around 80% for both.
4. Two runs of PromethION now complete. Redbean assembler running. Planning to test Canu workflow on NCI, but if this doesn’t work out, will start soon on machine at RBGDT.

Reference genome – Areocleome and second Ref Genome EOI
Areocleome had significant challenges. The Steering Committee decided to conclude the project after discussions with the project lead.

GAP learnt a lot from all 3 pilot projects and will use this experience to draft the second EOI. The second EOI is currently open with a deadline of Mon, 6 July. Link to the EOI:

Conservation Genomics
Conservation Genomics expression of interest (EOI) has been drafted and is currently open., with a deadline of Mon, 6 July.


David Cantrill GAP webinar
David Cantrill presented an overview of the GAP project at the BioCommons webinar. The recording is available here: