qbasepileup snp mode
java -jar qbasepileup.jar -m snp ...
In this mode, qbasepileup reads one or more BAM files, a reference genome, and a file containing positions of SNPs. It finds the reference genome base at the SNP position as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported. By default, duplicates and unmapped reads are excluded.
Options
--help, -h Shows usage and help text
--version, -v Shows current version number.
--log Required. Path to log file.
--loglevel Optional. Logging level, e.g. INFO,DEBUG. Default INFO.
-i Optional: snp or indel. Default snp.
-i Required if no -b option. Path to BAM file.
-b Required if no -i option. Path to tab delimited file with
list of BAMs. File should contain 3 columns: Integer
identifier, Donor, Path to BAM file.
-s Required. Path to containing file of snp positions.
Formats: dcc1, maf, tab, dccq, vcf
-r Required. Path to reference genome fasta file.
-o Required. Output file name.
-of Optional. Output file format: rows or columns (default rows).
Results for the output file listing each bam by row or
column.
-f Optional. Format of SNPs file. Default dcc1.
-p Optional. Run a pileup profile. Default standard.
--filter Optional. qbamfilter query to use
--t Optional. Thread number. Total thread number = number
supplied + 2. Default 1 (total threads 4).
--strand Optional. Separate coverage by strand. (y or n, default y).
--mq Optional. Minimum mapping quality score for accepting a read.
Default is any mapping quality ie no filtering.
--bq Optional. Minimum base quality score for accepting a read.
Default is any base quality ie no filtering.
--intron Optional. Include reads with indels. (y or n, default y).
--ind Optional. Include reads mapping across introns. (y or n,
default y).
--novelstarts Optional. Report number of novel starts for each base type,
rather than number of reads. (y or n, default n).
-f
Format of the SNP file specified with -s. This option is optional and
the default is dcc1. Current supported formats are:
- dcc1 - an ICGC data submission format
- maf - Mutation Annotation Format
- dccq - an extension of the dcc1 format but with extra fields
- vcf - Variant Call Format
- tab - Tab-delimited
If you don't have a VCF or MAF file, the tab format is often the easiest to construct. It is a tab delimited plain text file with 4 columns and no header. The columns are: id, chromosome, start_position, end_position
--profile
This option provides predefined sets of values for 5 other
options that are frequently used together: --strand, --indel,
--intron, -bq, and -mq.
There are 4 currently defined progiles - standard, dna, rna,
torrent.
All profiles will ignore unmapped reads and reads where the DuplicateRead flag is true.
| Profile | Filter | Metrics | ||||
|---|---|---|---|---|---|---|
| Mapping quality | Base quality | Strand Specific | Include introns | Include indels | Novelstarts | |
| standard | any | any | y | y | y | n |
| dna | 10 | 10 | y | n | y | n |
| rna | 10 | 7 | n | y | y | y |
| torrent | 1 | 0 | n | y | y | n |
--strand, --indel, --intron, -bq, -mq
--novelstarts
novelstarts is a count of how many reads with different start sites cover a particular position. For example, if 10 reads covered a position but they all had alignments starting at the same position then novelstarts=1. If 10 reads covered a position and 4 started at the same position and the outher 6 all started at different positions then novelstarts=7. novelstarts helps spot cases where there might be some question about whether the reads are duplicates, even if they are paired ans the pairs seem to be different.
Examples
Defaults with BAM file list
qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
-o output.pileup.txt --log log_file.log
- Default file format is dcc1
- Standard profile:
- Print strand specific info
- Include any base or mapping quality
- Include reads in introns
- Include reads with indels
Defaults with single input BAM file
qbasepileup -i input.bam -s snps.dcc1 -r reference.fa \
-o output.pileup.txt --log log_file.log
- Default file format is dcc1
- Standard profile:
- Print strand specific info
- Include any base or mapping quality
- Include reads in introns
- Include reads with indels
Use of --filter qbamfilter query
qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
-o output.pileup.txt --log log_file.log --filter "option_SM > 30"
- Default file format is dcc1
- Standard profile.
- Qbamfilter query of option_SM > 30
SNP file format is MAF
qbasepileup -b bam_list.txt -s snps.maf -r reference.fa \
-o output.pileup.txt --log log_file.log -f maf
Torrent profile
qbasepileup -b bam_list.txt -s snps.maf -r reference.fa \
-o output.pileup.txt --log log_file.log -f maf -p torrent
- File format is MAF
- Torrent profile:
- Do not print strand specific info
- Min base quality: 0
- Min mapping quality: 1
- Include reads in introns
- Include reads with indels
Run with user defined filtering options
qbasepileup -i input.bam -s snps.dcc1 -r reference.fa \
-o output.pileup.txt --log log_file.log \
--strand n --indel n --intron n -bq 5 -mq 10
- No strand specific base coverage information
- Do not include reads with indels
- Do not include reads in introns
- Read must have a minimum mapping quality of 10
- Read must have a minimum base quality of 5.
Novel starts
qbasepileup -b bam_list.txt -s snps.dcc1 -r reference.fa \
-o output.pileup.txt --log log_file.log --novelstarts
- File format is dcc1
- Print strand specific info
- Include any base or mapping quality
- Include reads in introns
- Include reads with indels
- Count only novel starts