qsignature Generate mode
In order to define a set of SNVs that would be common across all major sequencing and aray platform, we selected all single-base dbSNP-derived SNPs included on the (2015) OMNI-1Mquad genotyping array (~1.4 million SNPs). These SNVs are common to other members of the Illumina OMNI array family as well as whole genome data and some regions of exome data and targeted gene panels. The qsignature test determines the nucleotide frequencies at each of the SNV positions using spot intensities for genotyping microarrays and base pileups for BAM files.
Genotyping array intensities are transformed into relative nucleotide counts using the following formula:
T = ⌊C⋅e^LRR ⌋
A = ⌊BAF⋅T⌋
R = T-A
T = total counts
A = alternate allele count
R = reference allele count
C = pseudocount,20
LRR = logR ratio
BAF = B-allele frequency
To calculate nucleotide frequencies from BAM reads, we perform a pileup at each of the selected SNV positions and report the total count of each nucleotide from reads that have a mapping quality of at least 10; a base quality of at least 10; have passed the vendor check; are the primary alignment; and are not a duplicate read.
VCF generation takes about 20 minutes on a single core to report nucleotide counts from 500 million reads and less than a minute to estimate counts from a genotype array. This step needs to be performed only once per file.
Usage
java -cp qsignature.jar org.qcmg.sig.Generate \
-log $BAM.qsig.log \
-snpPositions qsignature_positions.txt \
-input $BAM \
-illuminaArraysDesign Illumina_arrays_design.txt
Options
- -snpPositions REQUIRED - positions file - this is a hg19 based tab delimited text file that contains the positions at which qsignature will report upon. For bam files, a pileup is performed, and for snp array files, the logR ratio is used to determine the ref/alt split
- -input REQUIRED - data file - BAM or snp array txt file (Genome Studio)
- -log REQUIRED - output file containing logging information
- -illuminaArraysDesign OPTIONAL (REQUIRED if running against snp array txt file)- Illumina arrays design text file - contains information on how to treat entries in the snp array files
- -minMappingQuality OPTIONAL - minimum mapping quality (defaults to 10)
- -minBaseQuality OPTIONAL - minimum base quality (defaults to 10)
- -validation OPTIONAL - validation stringency to use when reading BAM files (defaults to STRICT, unless mapped by bwa, in which case SILENT)
Outputs
VCF file with coverage (either calculated or real) at the positions of interest.
Example
##fileformat=VCFv4.2
##datetime=2021-03-09T10:52:34.073
##program=SignatureGeneratorBespoke
##version=58-1f6355ea
##java_version=1.8.0_152
##run_by_os=Linux
##run_by_user=cromwelltst
##snp_positions=qsignature_positions.txt
##gene_positions=null
##reference=null
##positions_md5sum=d18c99f481afbe04294d11deeb418890
##positions_count=1456203
##filter_base_quality=10
##filter_mapping_quality=10
##illumina_array_design=null
##cmd_line=SignatureGeneratorBespoke --snpPositions qsignature_positions.txt -input 55c7fbcf-439d-4058-9f55-6bd2d55127f5.bam -log 55c7fbcf-439d-4058-9f55-6bd2d55127f5.bam.qsig.vcf.log --output 55c7fbcf-439d-4058-9f55-6bd2d55127f5.bam.qsig.vcf.gz --validation SILENT
##INFO=<ID=QAF,Number=.,Type=String,Description="Lists the counts of As-Cs-Gs-Ts for each read group, along with the total">
##input=55c7fbcf-439d-4058-9f55-6bd2d55127f5.bam
##rg0=null
##rg1=13fbd3ce-9667-49b4-8936-1b7a02648bf0
##rg2=cb006bf1-0636-4c3d-9877-6319db04fa3f
#CHROM POS ID REF ALT QUAL FILTER INFO
chr1 696838 . A . . . QAF=t:8-0-0-0,rg1:6-0-0-0,rg2:2-0-0-0
chr1 725060 . A . . . QAF=t:5-0-0-0,rg1:4-0-0-0,rg2:1-0-0-0
chr1 725737 . T . . . QAF=t:0-0-0-2,rg1:0-0-0-1,rg2:0-0-0-1
chr1 725908 . A . . . QAF=t:5-0-0-0,rg1:1-0-0-0,rg2:4-0-0-0
chr1 726060 . T . . . QAF=t:0-0-0-1,rg2:0-0-0-1
chr1 726224 . A . . . QAF=t:3-0-0-0,rg1:2-0-0-0,rg2:1-0-0-0
chr1 727037 . A . . . QAF=t:9-0-0-0,rg1:7-0-0-0,rg2:2-0-0-0
chr1 823451 . T . . . QAF=t:1-0-0-91,rg1:1-0-0-44,rg2:0-0-0-47
chr1 882803 . A . . . QAF=t:0-0-31-0,rg1:0-0-14-0,rg2:0-0-17-0
chr1 883899 . T . . . QAF=t:0-0-2-80,rg1:0-0-1-44,rg2:0-0-1-36
chr1 1223621 . G . . . QAF=t:0-3-62-0,rg1:0-2-33-0,rg2:0-1-29-0
chr1 1223728 . C . . . QAF=t:0-30-0-0,rg1:0-14-0-0,rg2:0-16-0-0
chr1 1223837 . C . . . QAF=t:0-39-0-0,rg1:0-19-0-0,rg2:0-20-0-0
chr1 1223956 . T . . . QAF=t:0-0-1-42,rg1:0-0-0-25,rg2:0-0-1-17