qmito
qmito produces mitochondrial sequencing summary reports from BAM files.
Usage
java -jar qmito.jar [options]
Options
--help
--version
-m Mode, [metric, stat]
--input, -i Input BAM file
--output, -o Output file
--log Log file
--thread Opt, number of execution threads (default=2)
qmito only works on single BAMs so if you have tumour/normal pairs, or
families or other collections of related BAM files that you may wish to
analyse jointly, you should run qmito in --mode metric
independently on each of the BAMs and then use --mode stat to compare
the metric files.
Design
- read in a single BAM
- use qbamfilter to only keep read pairs we want:
and( or(RNAME == 'chrM', RNEXT == 'chrM'), flag_NotPrimaryAlignment==false, flag_ReadFailsVendorQuality==false) - note that DUPs and unmapped reads are OK which means the filter should pull out pairs where one or both reads mapped to the mitochondrial sequence
- write a report with the same params as Felicity's
qpileup(approx 40 params) but we want to write it to a TSV file, not the HDF5 format used byqpileup.
example --filter string:
and( or(RNAME == 'chrM', RNEXT == 'chrM'),
flag_NotPrimaryAlignment==false,
flag_ReadFailsVendorQuality==false,
flag_SupplementaryRead)
and( or(RNAME == 'chrM', RNEXT == 'chrM'),
flag_NotPrimaryAlignment==false,
flag_ReadFailsVendorQuality==false,
flag_SupplementaryRead)
mode
metric
This mode reads multiple BAM files
Options:
-h, --help
-q, --query
-i, --input
-o, --output
-r, --referenceFile
--log
--loglevel
--lowreadcount
--nonrefthreshold
stat
Stat mode reads 2 metric mode report files and does a chi-square of the A/C/G/T base counts of test vs control. Forward and reverevse strand counts are done separately.
-h, --help
-t, --test
-c, --contronal
-o, --output
--log
--loglevel
Examples
java -jar -Xms5g -Xmx5g qmito-0.1pre.jar \
-i $tumourBAM \
-o ${tumourBAM}.qmito.tsv \
--log ${tumourBAM}.qmito.log
Output is a tab-separated plain-text file with unix line endings. The format is that each position in the mitochondrial genome is a row and the columns are the various parameters collected at each position.
#position A_forward C_forward G_forward T_forward …
1 7 95 0 0
2 93 0 0 0
3 0 0 107 2
4 0 8 96 1
...
From class PileupDataRecord it looks like these might be the items that are collected and stored in metric mode for forward and reverse strands.
public static String headings;
private final Integer position;
private int referenceNo = 0;
private int nonReferenceNo = 0;
private final int highNonReference = 0;
private int lowReadCount = 0;
private int baseA = 0;
private int baseC = 0;
private int baseG = 0;
private int baseT = 0;
private int baseN = 0;
private long aQual = 0;
private long cQual = 0;
private long gQual = 0;
private long tQual = 0;
private long nQual = 0;
private long mapQual = 0;
private int startAll = 0;
private int startNondup = 0;
private int stopAll = 0;
private int dupCount = 0;
private int mateUnmapped = 0;
private int cigarI = 0;
private int cigarD = 0;
private int cigarDStart = 0;
private int cigarS = 0;
private int cigarSStart = 0;
private int cigarH = 0;
private int cigarHStart = 0;
private int cigarN = 0;
private int cigarNStart = 0;
private String reference;
boolean isReverse = false;