Bioinformatics Example Repository
Group: Single Sample Analysis Pipelines
Creates single-UMI consensus reads and calls somatic variants.
Starts with an unmapped BAM file with the UMI in a tag (RX
by default).
The reads are aligned with bwa-mem, then duplicate marked using the UMI tag, grouped by molecular ID to create a “grouped” BAM file, and consensus called to produce per-molecule consensus reads.
The consensus reads are aligned with bwa-mem and subsequently filtered.
The de-duplicated BAM and consensus reads are both variant called with VarDictJava and are there run through a panel of metrics. This allows us to compare the difference between using the single-UMIs simply for better duplicate marking, or for improving the quality of reads through consensus calling.
Name | Flag | Type | Description | Required? | Max Values | Default Value(s) |
---|---|---|---|---|---|---|
input-bam | i | DirPath | Path to the unmapped BAM file. | Required | 1 | |
ref | r | PathToFasta | Path to the reference FASTA. | Required | 1 | |
intervals | l | PathToIntervals | Regions to analyze. | Required | 1 | |
truth-vcf | v | PathToVcf | Truth VCF for the sample being sequenced. | Optional | 1 | |
output | o | PathPrefix | Path prefix for output files. | Required | 1 | |
umi-tag | U | String | The tag containing the raw UMI. | Optional | 1 | RX |
molecular-id-tag | I | String | The tag to store the molecular identifier. | Optional | 1 | MI |
min-map-q | m | Int | Minimum mapping quality to include reads. | Optional | 1 | 10 |
strategy | s | String | The UMI assignment strategy; one of ‘Identity’, ‘Edit’, ‘Adjacency’. | Optional | 1 | Adjacency |
edits | e | Int | The allowable number of edits between UMIs. | Optional | 1 | 1 |
error-rate-pre-umi | 1 | Int | The Phred-scaled error rate for an error prior to the UMIs being integrated. | Optional | 1 | 45 |
error-rate-post-umi | 2 | Int | The Phred-scaled error rate for an error post the UMIs have been integrated. | Optional | 1 | 30 |
min-input-base-quality | q | Int | The minimum input base quality for bases to go into consensus. | Optional | 1 | 30 |
min-consensus-base-quality | Q | Int | Mask (make ‘N’) consensus bases with quality less than this threshold. | Optional | 1 | 40 |
min-reads | M | Int | The minimum number of reads to produce a consensus base. | Optional | 1 | 1 |
max-base-error-rate | Double | The maximum consensus error rate per base. | Optional | 1 | 0.1 | |
max-read-error-rate | Double | The maximum consensus error rate for per read | Optional | 1 | 0.05 | |
max-no-call-fraction | Double | The maximum fraction of bases that are Ns in a consensus read. | Optional | 1 | 0.1 | |
minimum-af | Double | The minimum allele frequency to use for variant calling. | Optional | 1 | 0.0025 |