qbasepileup snp mode

java -jar qbasepileup.jar -m snp ...

In this mode, qbasepileup reads one or more BAM files, a reference genome, and a file containing positions of SNPs. It finds the reference genome base at the SNP position as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported. By default, duplicates and unmapped reads are excluded.

Options

--help, -h     Shows usage and help text
--version, -v  Shows current version number.
--log          Required. Path to log file.
--loglevel     Optional. Logging level, e.g. INFO,DEBUG. Default INFO.
-i             Optional: snp or indel. Default snp.
-i             Required if no -b option. Path to BAM file.
-b             Required if no -i option. Path to tab delimited file with 
                   list of BAMs. File should contain 3 columns: Integer 
                   identifier, Donor, Path to BAM file.
-s             Required. Path to containing file of snp positions.
                   Formats: dcc1, maf, tab, dccq, vcf
-r             Required. Path to reference genome fasta file.
-o             Required. Output file name.
-of            Optional. Output file format: rows or columns (default rows). 
                   Results for the output file listing each bam by row or
                   column.
-f             Optional. Format of SNPs file. Default dcc1.
-p             Optional. Run a pileup profile. Default standard.
--filter       Optional. qbamfilter query to use
--t            Optional. Thread number. Total thread number = number
                   supplied + 2. Default 1 (total threads 4).
--strand       Optional. Separate coverage by strand. (y or n, default y).
--mq           Optional. Minimum mapping quality score for accepting a read.
                   Default is any mapping quality ie no filtering.
--bq           Optional. Minimum base quality score for accepting a read.
                   Default is any base quality ie no filtering.
--intron       Optional. Include reads with indels. (y or n, default y).
--ind          Optional. Include reads mapping across introns. (y or n,
                   default y).
--novelstarts  Optional. Report number of novel starts for each base type,
                   rather than number of reads. (y or n, default n).

`-f`

Format of the SNP file specified with -s. This option is optional and the default is dcc1. Current supported formats are:

dcc1 - an ICGC data submission format
maf - Mutation Annotation Format
dccq - an extension of the dcc1 format but with extra fields
vcf - Variant Call Format
tab - Tab-delimited

If you don't have a VCF or MAF file, the tab format is often the easiest to construct. It is a tab delimited plain text file with 4 columns and no header. The columns are: id, chromosome, start_position, end_position

`--profile`

This option provides predefined sets of values for 5 other options that are frequently used together: --strand, --indel, --intron, -bq, and -mq. There are 4 currently defined progiles - standard, dna, rna, torrent.

All profiles will ignore unmapped reads and reads where the DuplicateRead flag is true.

Profile	Filter		Metrics
Profile	Mapping quality	Base quality	Strand Specific	Include introns	Include indels	Novelstarts
standard	any	any	y	y	y	n
dna	10	10	y	n	y	n
rna	10	7	n	y	y	y
torrent	1	0	n	y	y	n

`--strand`, `--indel`, `--intron`, `-bq`, `-mq`

`--novelstarts`

novelstarts is a count of how many reads with different start sites cover a particular position. For example, if 10 reads covered a position but they all had alignments starting at the same position then novelstarts=1. If 10 reads covered a position and 4 started at the same position and the outher 6 all started at different positions then novelstarts=7. novelstarts helps spot cases where there might be some question about whether the reads are duplicates, even if they are paired ans the pairs seem to be different.

Examples

Defaults with BAM file list

qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
    -o output.pileup.txt --log log_file.log

Default file format is dcc1
Standard profile:
- Print strand specific info
- Include any base or mapping quality
- Include reads in introns
- Include reads with indels

Defaults with single input BAM file

qbasepileup -i input.bam -s snps.dcc1 -r reference.fa \
    -o output.pileup.txt --log log_file.log

Default file format is dcc1
Standard profile:
- Print strand specific info
- Include any base or mapping quality
- Include reads in introns
- Include reads with indels

Use of `--filter` qbamfilter query

qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
    -o output.pileup.txt --log log_file.log --filter "option_SM > 30"

Default file format is dcc1
Standard profile.
Qbamfilter query of option_SM > 30

SNP file format is MAF

qbasepileup -b bam_list.txt -s snps.maf -r reference.fa \
    -o output.pileup.txt --log log_file.log -f maf

Torrent profile

qbasepileup -b bam_list.txt -s snps.maf -r reference.fa \
    -o output.pileup.txt --log log_file.log -f maf -p torrent

File format is MAF
Torrent profile:
- Do not print strand specific info
- Min base quality: 0
- Min mapping quality: 1
- Include reads in introns
- Include reads with indels

Run with user defined filtering options

qbasepileup -i input.bam -s snps.dcc1 -r reference.fa \
    -o output.pileup.txt --log log_file.log \
    --strand n --indel n --intron n -bq 5 -mq 10

No strand specific base coverage information
Do not include reads with indels
Do not include reads in introns
Read must have a minimum mapping quality of 10
Read must have a minimum base quality of 5.

Novel starts

qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
    -o output.pileup.txt --log log_file.log --novelstarts

File format is dcc1
Print strand specific info
Include any base or mapping quality
Include reads in introns
Include reads with indels
Count only novel starts