Better memory usage

We’ve made two changes to DISCOVAR de novo:

1. Memory usage during BAM file reading is now lower.

2. We’ve added a memory check to see how much memory appears to be available, and then throttle memory usage to that level. Sometimes (and particularly on scheduled systems like SGE) the amount of available memory is reduced. Please let us know if you observe aberrant behavior. The feature can be turned off by setting MEMORY_CHECK=False on the DiscoverExp command line.

Reference support added

We’ve added support for use of a reference sequence with DISCOVAR de novo. If you provide a reference sequence, then your assembly will be created de novo, then aligned to the reference sequence. Then when you view the assembly using NhoodInfo, the view will be marked to show reference coordinates.

To use this new feature, supply the command-line option REFHEAD=g, where you have files g.fasta and g.names. The file g.names should be a text file having the same number of lines as g.fasta has records, with each line being a name for the corresponding record (e.g. chr3). These names are displayed by NhoodInfo, so you want them to be reasonably short to avoid crowding the display.

Fractional read support added

We’ve added support for use of part of a read set. For example,

READS="frac:0.6 :: x.bam"

will cause 60% of the read pairs in x.bam to be chosen at random and assembled. This is useful to understand the effect of lower coverage and in cases where one has too much data to assemble on a given machine. The frac option can be combined with the sample option e.g.

READS="sample:T :: t.bam + sample:N,frac:0.5 :: n.bam"

to use all the reads in t.bam but only half of the reads in n.bam.