plotReads – Visualizing BAM Alignments

The plotReads program is an update of Murim’s p08PlotReads software, which can handle any BAM or CRAM file, can report on a single position, multiple positions or a VCF file, can handle any reference, and can generate either image or text alignments.  The command-line format for plotReads is the following:

plotReads [options] posOrVcf bamFile

where “posOrVcf” is (1) a single “chr:position” string,  (2) a file containing “chr:position” lines, (3) a VCF file or (4) the string “-” telling plotReads to read standard input for “chr:position” lines or a VCF file.

The default output for each position is a PNG file displaying the alignments (see below for an example), written to the file “prefix_chr_position_plotreads.png”, where “prefix” is the name of the bam file, with “.bam” stripped off, and “chr” and “position” are the chromosome and position of the alignment.

The supported options are the following:

  • -t – Output text format instead of image files for each position (see below for the format).   [Default:  Image format]
  • -d – In text format mode, output difference character alignments instead of the DNA characters (see below for this format)
  • -i prefix – In image format mode, change the prefix for each image file.  [Default:  BAM filename]
  • -r fastafile – Set the reference to be used.  Must be a fasta file indexed by samtools.  [Default:  Inferred from BAM file MD tags.]
  • -w # – Change the colwidth of the aligned display.  The alignment is shown using this width before and after the given position.  [Default:  50 for images, 60 for text]
  • -rw # – Change the “readwidth” of the aligned display.  Only reads that are aligned within this number of bases from the given position are displayed in the alignment.  [Default: 1]

Image Format Output

When images are output, the images display an alignment using colored blocks for each of the bases, where A is red, C is blue, G is green, T is yellow and N is grey.  The width of the colored block is proportional to the quality score of that base (full blocks are shown for quality scores near 40 and only vertical slivers are shown for quality scores nearer to 0). Gaps are shown as a horizontal dash.  The strandedness of each read’s alignment is shown in the leftmost column (‘>’ is a forward strand alignment, ‘<‘ is a reverse strand alignment).  An example is the following (click to view larger image):

Text Format Output

When text is output, the default text output is a standard display of the alignment, with the DNA characters of the alignments and dashes for gaps.  Each base is shown in upper- or lower-case depending on if the base’s quality score is higher than 13 or 13 or less, respectively.

At the beginning of each read’s alignment, the read’s mapping quality score and strand is displayed (‘>’ is a forward strand alignment and ‘<‘ is a reverse strand alignment), so that you can determine if this is a low mapping quality region or is biased for forward or reverse strand alignments.

An example of this output is the following (click to view larger image):

If the -d option is given, in addition to the -t option for text format, the alignment is shown using “difference” characters similar to the tview display from samtools.  Read bases that match the reference are displayed as ‘.’ or ‘,’ depending on whether they are forward or reverse strand reads).  Read bases that differ are shown as the different base, where bases with quality score of 13 or less are shown in lowercase.  Gaps are displayed using asterisks (to give it a visual diference from the dots and commas that form most of the alignment).

An example of this output is the following (click to view larger image):