qsnp .ini file

.ini file sections

The qsnp ini files are split into a number of sections

ids

Property name	Description
donor	Patient for which qsnp is being run
normalSample	sample id pertaining to the control sample
tumourSample	sample id pertaining to the test sample
analysisId	unique id for this analysis (usually a type 4 uuid)

parameters

Property name	Required	Description
runMode	yes	indicates the mode in which qsnp should run. Possible values are [standard, vcf, mutect]
annotateMode	no	if set to dcc, will output partially completed dcc files. If left blank, will output VCF file only
filter	no	only used by standard mode when it filters reads from the incoming bam files. If this section is blank, no filtering is performed other than to remove duplicates
noOfRecordsFailingFilter	no	only used by standard mode. If this number of records is reached without a single one passinf the filter, fail. Defaults to 1000000.
numberNovelStarts	no	Specifies the minimum number of novel starts required so that the NNS flag is not applied. Defaults to 4.
numberMutantReads	no	Specifies the minimum number of mutant reads required so that the MR flag is not applied. Defaults to 5.
validation	no	Specifies the validation stringency to be used when parsing the bam files. Possible values are SILENT, STRICT, LENIENT. Defaults to STRICT, unless there is an entry in the header file indicating that bwa was used to map the bam, in which case defaults to SILENT.

rules

The rules that stipulate whether a position has enough variants to be considered a position of interest are defined in this section. There are separate rules for the normal (control) bam and for the tumour (test) bam The rules are in the format: minimum coverage, maximum coverage,number of variants needed.

Example:

normal1=0,20,3
normal2=21,50,4
normal3=51,,10
tumour1=0,20,3
tumour2=21,50,4
tumour3=51,,5

Here we are specifying that for the normal bam, if we have coverage between 0 and 20, then we need at least 3 alts. If we have coverage between 21 and 50, we need 4 alts, and if we have 51 and over, we need 10% of reads to contain the alt. The same rules apply for reads from the tumour bam apart from the case where the coverage is 51 or over, in which case we are after 5% of reads containing the alt before the position will be considered.

rules rules

As many rules as is appropriate may be specified for each of normal and tumour, as long as each rule begins with either "normal" or "tumour" and is unique.
If more than 1 rule applies to a position with a certain coverage, then qsnp will exit with an error message
If some coverage values do not have a valid rule defined, then qsnp will emit a warning, and ignore all positions that have that coverage range.
If the maximum coverage value is not specified, then it is assumed that it is the Integer.MAX_VALUE, and the number of variants number is then a percentage rather than the actual number of variants.

inputFiles

Property name	Mode	Required	Description
dbSNP	all	no	This file is available in compressed format as 00-All.vcf.gz from the dbSNP FTP site You will need to download it and uncompress it to a directory where qSNP can see it
germlineDB	all	no	This is a QCMG-specific VCF file which contains germline SNPs called in other samples. This is used to look for evidence that a somatic SNP appears as a germline SNP in another patient
chrConv	all	yes (if annotateMode is dcc)	This is another QCMG-specific file used to resolve BAM files that have different names for the same sequences. For example the Ensemblv55 sequence "HSCHR1_RANDOM_CTG5" is called "GL000191.1" in the QCMG GRCh37 standard genome, "chr25" by diBayes, and "93" in the ICGC DCC_v0.4. This concordance file is primarily used during creation of the DCC file where an integer identifier is used rather than the versioned RefSeq identifiers used in the QCMG reference genome and BAMs.
ref	standard	yes	This is the reference used to align the bam files
normalBam	all	yes	Control bam file
tumourBam	all	yes	Test bam file
vcfNormal	vcf & mutect	yes	GATK UnifiedGenotyper/HaplotypeCaller output from control bam
vcfTumour	vcf & mutect	yes	GATK UnifiedGenotyper/HaplotypeCaller output from test bam
illuminaNormal	all	no	SNP microarray data from Illumina's GenomeStudio software in text format (TSV). The header from this file is shown in the example immediately below this table so you can check that your SNP data is in an appropriate format. All potential SNPs are checked against this table to look for concordance and if a qSNP called variant is also found in the SNP array, then a code (48?) is added to the validation column of the DCC output file. Note that the Illumina genotypes may not match the qSNP genotypes because of the Illumina TOP/BOT convention so just be aware that a mismatch between BAM and array does not necessarily mean that the genotypes are different. (/SNP_array/.txt)
illuminaTumour	all	no	As for illuminaNormal but for the test sample SNP microarray.

outputFiles

qsnp outputs a single VCF file containing both somatic and germline mutations. It can also optionally output files (1 for somatic and another for germline) that are in a partial dcc format (used by the ICGC).

Property name	Mode	Required	Description
vcf	all	yes	full path to the output vcf file
dccSomatic	all	no	full path to the output somatic dcc file
dccGermline	all	no	full path to the output germline dcc file

Example ini files

Standard mode

[inputFiles]
dbSNP = /dbSNP/135/00-All.vcf
germlineDB = /qsnp/icgc_germline_qsnp.vcf
chrConv = /qsnp/chromosome_conversions.txt
ref = /genomes/GRCh37_ICGC_standard_v2/GRCh37_ICGC_standard_v2.fa
normalBam = /ABCD_1234/control.bam
tumourBam = /ABCD_1234/test.bam
illuminaNormal = /ABCD_1234/SNP_array/control_snp_array.txt
illuminaTumour = /ABCD_1234/SNP_array/test_snp_array.txt

[parameters]
runMode = standard
filter = and (Flag_DuplicateRead==false , CIGAR_M>34 , MD_mismatch <= 3 , option_SM > 10)
annotateMode = dcc

[ids]
donor = ABCD_1234
normalSample = QWERTY-XXYY-20130816-028
tumourSample = QWERTY-XXYY-20131107-114
analysisId = c57c66e4_dfad_47ea_a71b_ea37a004e042

[outputFiles]
vcf = /ABCD_1234/variants/qSNP/c57c66e4_dfad_47ea_a71b_ea37a004e042/ABCD_1234.vcf
dccSomatic = /ABCD_1234/variants/qSNP/c57c66e4_dfad_47ea_a71b_ea37a004e042/ABCD_1234.SomaticSNV.dcc1
dccGermline = /ABCD_1234/variants/qSNP/c57c66e4_dfad_47ea_a71b_ea37a004e042/ABCD_1234.GermlineSNV.dcc1

[rules]
normal1=0,20,3
normal2=21,50,4
normal3=51,,10
tumour1=0,20,3
tumour2=21,50,4
tumour3=51,,5

VCF mode

[inputFiles]
vcfNormal = /ABCD_1234/variants/GATK/9ab5efed_a5eb_4158_9b89_e156913450fc/ABCD_1234.Control.vcf
vcfTumour = /ABCD_1234/variants/GATK/9ab5efed_a5eb_4158_9b89_e156913450fc/ABCD_1234.Test.vcf
dbSNP = /dbSNP/135/00-All_chr.vcf
germlineDB = /qsnp/icgc_germline_qsnp.vcf
chrConv = /qsnp/chromosome_conversions.txt
ref = /genomes/GRCh37_ICGC_standard_v2/GRCh37_ICGC_standard_v2.fa
normalBam = /ABCD_1234/control.bam
tumourBam = /ABCD_1234/test.bam
illuminaNormal = /ABCD_1234/SNP_array/control_snp_array.txt
illuminaTumour = /ABCD_1234/SNP_array/test_snp_array.txt

[parameters]
runMode = vcf
annotateMode = dcc
minimumBaseQuality=10
pileupOrder=NT

[ids]
donor = ABCD_1234
normalSample = QWERTY-XXYY-20130816-028
tumourSample = QWERTY-XXYY-20131107-114
analysisId = 9ab5efed_a5eb_4158_9b89_e156913450fc

[outputFiles]
vcf = /ABCD_1234/variants/GATK/9ab5efed_a5eb_4158_9b89_e156913450fc/APGI_3507.vcf
dccSomatic = /ABCD_1234/variants/GATK/9ab5efed_a5eb_4158_9b89_e156913450fc/APGI_3507.SomaticSNV.dcc1
dccGermline = /ABCD_1234/variants/GATK/9ab5efed_a5eb_4158_9b89_e156913450fc/APGI_3507.GermlineSNV.dcc1

[rules]
normal1=0,20,3
normal2=21,50,4
normal3=51,,10
tumour1=0,20,3
tumour2=21,50,4
tumour3=51,,5