qbasepileup

Introduction

qbasepileup performs base pileup on positions of interest in a BAM file and produces base coverage information and various others metrics on the reads at the positions of interest.

Installation

qbasepileup requires java 7 and (ideally) a multi-core machine (5 threads are run concurrently) with at least 20GB of RAM. Download the qbasepileup tar file Untar the tar file into a directory of your choice You should see jar files for qbasepileup and its dependencies:

$ tar xjvf qbasepileup.tar.bz2
x antlr-3.2.jar
x ini4j-0.5.2-SNAPSHOT.jar
x jopt-simple-3.2.jar
x picard-1.110.jar
x qbamfilter-1.0pre.jar
x qcommon-0.1pre.jar
x qio-0.1pre.jar
x qpicard-0.1pre.jar
x qpileup-0.1pre.jar
x sam-1.110.jar
x jhdfobj.jar
x jhdf5obj.jar
x jhdf5.jar
x jhdf.jar

Usage

A general invocation of qbasepileup looks like:

java -jar qbasepileup.jar [OPTIONS]

A typical invocation might look like:

java -jar qbasepileup.jar -m mode \
    -i file.bam -r ref.fa -s positions.txt -o pileup.txt --log file.log

Options

--help, -h      Show help message.
--version       Print version.
-b              Path to tab delimited file with list of bams.
--bq            Minimum base quality score for accepting a read.
--dup           Include duplicates
-f              Format of SNPs file [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA], Def=dcc1
--filter        Query string for qbamfilter
--gatk          Adjust insertion position to conform to GATK format
--hdf           HDF file to read list of bam files from
--hp            window around indel to check for homopolymers  [default:10]
-i              Path to bam file.
--ig            Path to somatic indel file
--in            Path to normal bam file
--ind           Include reads with indels. (y or n, default y).
--intron        Include reads mapping across introns. (y or n, default y).
--is            Path to somatic indel file
--it            Path to tumour bam file
--log           Req, Log file.
--loglevel      Logging level required, e.g. INFO, DEBUG. Default INFO.
-m              snp,indel,coverage or compoundsnp
--maxcov        Report reads that are less than the mininmum coverage option. Integer
--mincov        Report reads that are less than the
--mq            Minimum mapping quality score for accepting a read.
-n              Bases around indel to check for other indels., Def=3.
--novelstarts   Report novelstarts rather than read count [Y,N], Def=Y.
-o              Output file.
--of            Output file format [rows,columns].
--og            Output file for germline indels.
--os            Output file for somatic indels.
-p              Pileup profile type [ddc1,maf,db,dccq,vcf,torrent,RNA,DNA] Def=dcc1.
--pd            <pindel_deletions> Path to normal bam file
--pindel        adjust insertion position to conform to pindel format
-r              Path to reference genome fasta file.
-s              Path to tab delimited file containing snps. Formats: dcc1,dccq,vcf,maf,txt
--sc            <soft_clip_window> number of bases around indel to check for softclipped bases [default:13]
--strand        Separate coverage by strand. (y or n, default y)
--strelka       adjust insertion position to conform to strelka format (same as pindel)
-t              Thread number. Total = number supplied + 2.

-n

This integer is the number of bases around and indel to check for nearby indel. Default=3.

--pd

Path to normal bam file

Modes

snp

Reads one or more BAM files, a reference genome, and a file containing positions of SNPs. It finds the reference genome base at the SNP position as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported. By default, duplicates and unmapped reads are excluded.

compoundsnp

Reads one or more BAM files, a reference genome, and a file containing positions of compound SNPs (SNPs that sit next to each other). It finds the reference genome base at the compound SNP positions as well as the bases found at that position in all reads aligned to that region. Coverage per nucleotide is reported and the total coverage at that position is reported.

By default, the --filter qbamfilter query string is:

and( Flag_DuplicateRead==false, CIGAR_M>34, MD_mismatch <= 3, option_SM > 10)

For a more detailed description of qbamfilter and how it works to filter reads in and out of a particular analysis, see qbamfilter.

indel

Reads tumour and normal BAM files, a reference genome, and somatic and/or germline files containing positions of indels and pileups the reads around the indel to count a number of metrics. Metrics include:

  • total reads
  • number of reads that span the indel
  • number of reads with the indel
  • number of novel starts with the indel
  • number of reads with nearby soft clipping
  • number of reads with nearby indels

coverage

Reads one or more BAM files, and a file containing reference ranges and piles up the reads around the indel to count the number of reads covering each position in the range.