Bioinformatic workflows for target sequence capture projects

The GAP phylogenomics bioinformatics working group has combined newly developed and existing scripts into an integrated workflow for the assembly of target capture data.

The reconstruction of the GAP phylogenomics Australian Angiosperm Tree of Life (AAToL) utilises analysis pipelines that have been developed by the Phylogenomics Bioinformatics Working Group for easier deployment and to vastly reduce the number of commands required for their use. 

A manuscript outlining the analysis pipeline utilised to reconstruct the Australian Angiosperm Tree of Life is now available on the preprint server, bioRxiv:
https://doi.org/10.1101/2021.11.08.467817 

The links to these contaneristed analysis pipelines for extracting genetic sequences (HybPiper) from raw sequence reads and resolving paralogy (Yang and Smith workflow) are provided below. Additionally, links are provided for an analysis workflow to detect hybrids in target capture data sets (HybPhaser).


Assembling raw sequence reads with HybPiper

HybPiper is a bioinformatic workflow that extracts coding sequences and introns for phylogenetics from high-throughput sequencing reads. The HybPiper publication can be found here: https://dx.doi.org/10.3732%2Fapps.1600016

The original Hybpiper wiki tutorial can be accessed here:
https://github.com/mossmatters/HybPiper/wiki/Tutorial

HybPiper-RBGV offers a simplified running of HybPiper by uniting all steps into a Nextflow pipeline that is run with a single command and providing a Singularity container of all required software and dependencies. For more information go to the HybPiper-RBGV GitHub repository:
https://github.com/chrisjackson-pellicle/HybPiper-RBGV


Resolving paralogy using the Yang and Smith workflow

The Yang and Smith workflow uses four alternative tree-based orthology inference approaches to resolving paralogy in multi-gene datasets. It assumes that multiple, strongly divergent sequences from the same sample in a gene tree are the result of gene or genome duplications. Of the four approaches, one removes all genes showing any paralogy; one removes paralogs from genes; and the two others can extract multiple ortholog groups from each gene. Two publications outlining the method can be found here:
https://doi.org/10.1093/molbev/msu245
https://doi.org/10.1093/sysbio/syaa066

Yang-and-Smith-RBGV offers a simplified running of the Yang and Smith paralogy resolution workflow by uniting all steps into a Nextflow pipeline that is run with a single command and providing a Singularity container of all required software and dependencies. For more information go to the Yang-and-Smith-RBGV GitHub repository:
https://github.com/chrisjackson-pellicle/Yang-and-Smith-paralogy-resolution


Detection and phasing of hybrids in target capture data sets using HybPhaser

HybPhaser is a workflow to detect hybrids in target capture data sets and phase reads into parental lineages using a similarity and phylogenetic framework. The HybPhaser publication can be found here:
https://doi.org/10.1002/aps3.11441

For more information go to the HybPhaser GitHub repository:
https://github.com/LarsNauheimer/HybPhaser