Showing tool doc from version 4.1.0.0 | The latest version is 4.1.1.0

BaitDesigner (Picard)

Designs oligonucleotide baits for hybrid selection reactions.

This tool is used to design custom bait sets for hybrid selection experiments. The following files are input into BaitDesigner: a (TARGET) interval list indicating the sequences of interest, e.g. exons with their respective coordinates, a reference sequence, and a unique identifier string (DESIGN_NAME).

The tool will output interval_list files of both bait and target sequences as well as the actual bait sequences in FastA format. At least two baits are output for each target sequence, with greater numbers for larger intervals. Although the default values for both bait size (120 bases) nd offsets (80 bases) are suitable for most applications, these values can be customized. Offsets represent the distance between sequential baits on a contiguous stretch of target DNA sequence.

The tool will also output a pooled set of 55,000 (default) oligonucleotides representing all of the baits redundantly. This redundancy achieves a uniform concentration of oligonucleotides for synthesis by a vendor as well as equal numbersof each bait to prevent bias during the hybrid selection reaction.

Usage example:

java -jar picard.jar BaitDesigner \
TARGET=targets.interval_list \
DESIGN_NAME=new_baits \
R=reference_sequence.fasta

Category Reference


Overview

Designs baits for hybrid selection!

BaitDesigner (Picard) specific arguments

This table summarizes the command-line arguments that are specific to this tool. For more details on each argument, see the list further down below the table or click on an argument name to jump directly to that entry in the list.

Argument name(s) Default value Summary
Required Arguments
--DESIGN_NAME
null The name of the bait design
--REFERENCE_SEQUENCE
 -R
null Reference sequence file.
--TARGETS
 -T
null The file with design parameters and targets
Optional Tool Arguments
--arguments_file
[] read one or more arguments files and add them to the command line
--BAIT_OFFSET
80 The desired offset between the start of one bait and the start of another bait for the same target.
--BAIT_SIZE
120 The length of each individual bait to design
--DESIGN_ON_TARGET_STRAND
false If true design baits on the strand of the target feature, if false always design on the + strand of the genome.
--DESIGN_STRATEGY
FixedOffset The design strategy to use to layout baits across each target
--FILL_POOLS
true If true, fill up the pools with alternating fwd and rc copies of all baits. Equal copies of all baits will always be maintained
--help
 -h
false display the help message
--LEFT_PRIMER
ATCGCACCAGCGTGT The left amplification primer to prepend to all baits for synthesis
--MERGE_NEARBY_TARGETS
true If true merge targets that are 'close enough' that designing against a merged target would be more efficient.
--MINIMUM_BAITS_PER_TARGET
2 The minimum number of baits to design per target.
--OUTPUT_AGILENT_FILES
true If true also output .design.txt files per pool with one line per bait sequence
--OUTPUT_DIRECTORY
 -O
null The output directory. If not provided then the DESIGN_NAME will be used as the output directory
--PADDING
0 Pad the input targets by this amount when designing baits. Padding is applied on both sides in this amount.
--POOL_SIZE
55000 The size of pools or arrays for synthesis. If no pool files are desired, can be set to 0.
--REPEAT_TOLERANCE
50 Baits that have more than REPEAT_TOLERANCE soft or hard masked bases will not be allowed
--RIGHT_PRIMER
CACTGCGGCTCCTCA The right amplification primer to prepend to all baits for synthesis
--version
false display the version number for this tool
Optional Common Arguments
--COMPRESSION_LEVEL
5 Compression level for all compressed files created (e.g. BAM and VCF).
--CREATE_INDEX
false Whether to create a BAM index when writing a coordinate-sorted BAM file.
--CREATE_MD5_FILE
false Whether to create an MD5 digest for any BAM or FASTQ files created.
--GA4GH_CLIENT_SECRETS
client_secrets.json Google Genomics API client_secrets.json file path.
--MAX_RECORDS_IN_RAM
500000 When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.
--QUIET
false Whether to suppress job-summary info on System.err.
--TMP_DIR
[] One or more directories with space available to be used by this program for temporary storage of working files
--USE_JDK_DEFLATER
 -use_jdk_deflater
false Use the JDK Deflater instead of the Intel Deflater for writing compressed output
--USE_JDK_INFLATER
 -use_jdk_inflater
false Use the JDK Inflater instead of the Intel Inflater for reading compressed input
--VALIDATION_STRINGENCY
STRICT Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.
--VERBOSITY
INFO Control verbosity of logging.
Advanced Arguments
--showHidden
false display hidden arguments

Argument details

Arguments in this list are specific to this tool. Keep in mind that other arguments are available that are shared with other tools (e.g. command-line GATK arguments); see Inherited arguments above.


--arguments_file / NA

read one or more arguments files and add them to the command line

List[File]  []


--BAIT_OFFSET / NA

The desired offset between the start of one bait and the start of another bait for the same target.

int  80  [ [ -∞  ∞ ] ]


--BAIT_SIZE / NA

The length of each individual bait to design

int  120  [ [ -∞  ∞ ] ]


--COMPRESSION_LEVEL / NA

Compression level for all compressed files created (e.g. BAM and VCF).

int  5  [ [ -∞  ∞ ] ]


--CREATE_INDEX / NA

Whether to create a BAM index when writing a coordinate-sorted BAM file.

Boolean  false


--CREATE_MD5_FILE / NA

Whether to create an MD5 digest for any BAM or FASTQ files created.

boolean  false


--DESIGN_NAME / NA

The name of the bait design

R String  null


--DESIGN_ON_TARGET_STRAND / NA

If true design baits on the strand of the target feature, if false always design on the + strand of the genome.

boolean  false


--DESIGN_STRATEGY / NA

The design strategy to use to layout baits across each target

The --DESIGN_STRATEGY argument is an enumerated type (DesignStrategy), which can have one of the following values:

CenteredConstrained
Implementation that "constrains" baits to be within the target region when possible.
FixedOffset
Design that places baits at fixed offsets over targets, allowing them to hang off the ends as dictated by the target size and offset.
Simple
Ultra simple bait design algorithm that just lays down baits starting at the target start position until either the bait start runs off the end of the target or the bait would run off the sequence

DesignStrategy  FixedOffset


--FILL_POOLS / NA

If true, fill up the pools with alternating fwd and rc copies of all baits. Equal copies of all baits will always be maintained

boolean  true


--GA4GH_CLIENT_SECRETS / NA

Google Genomics API client_secrets.json file path.

String  client_secrets.json


--help / -h

display the help message

boolean  false


--LEFT_PRIMER / NA

The left amplification primer to prepend to all baits for synthesis

String  ATCGCACCAGCGTGT


--MAX_RECORDS_IN_RAM / NA

When writing files that need to be sorted, this will specify the number of records stored in RAM before spilling to disk. Increasing this number reduces the number of file handles needed to sort the file, and increases the amount of RAM needed.

Integer  500000  [ [ -∞  ∞ ] ]


--MERGE_NEARBY_TARGETS / NA

If true merge targets that are 'close enough' that designing against a merged target would be more efficient.

boolean  true


--MINIMUM_BAITS_PER_TARGET / NA

The minimum number of baits to design per target.

int  2  [ [ -∞  ∞ ] ]


--OUTPUT_AGILENT_FILES / NA

If true also output .design.txt files per pool with one line per bait sequence

boolean  true


--OUTPUT_DIRECTORY / -O

The output directory. If not provided then the DESIGN_NAME will be used as the output directory

File  null


--PADDING / NA

Pad the input targets by this amount when designing baits. Padding is applied on both sides in this amount.

int  0  [ [ -∞  ∞ ] ]


--POOL_SIZE / NA

The size of pools or arrays for synthesis. If no pool files are desired, can be set to 0.

int  55000  [ [ -∞  ∞ ] ]


--QUIET / NA

Whether to suppress job-summary info on System.err.

Boolean  false


--REFERENCE_SEQUENCE / -R

Reference sequence file.

R File  null


--REPEAT_TOLERANCE / NA

Baits that have more than REPEAT_TOLERANCE soft or hard masked bases will not be allowed

int  50  [ [ -∞  ∞ ] ]


--RIGHT_PRIMER / NA

The right amplification primer to prepend to all baits for synthesis

String  CACTGCGGCTCCTCA


--showHidden / -showHidden

display hidden arguments

boolean  false


--TARGETS / -T

The file with design parameters and targets

R File  null


--TMP_DIR / NA

One or more directories with space available to be used by this program for temporary storage of working files

List[File]  []


--USE_JDK_DEFLATER / -use_jdk_deflater

Use the JDK Deflater instead of the Intel Deflater for writing compressed output

Boolean  false


--USE_JDK_INFLATER / -use_jdk_inflater

Use the JDK Inflater instead of the Intel Inflater for reading compressed input

Boolean  false


--VALIDATION_STRINGENCY / NA

Validation stringency for all SAM files read by this program. Setting stringency to SILENT can improve performance when processing a BAM file in which variable-length data (read, qualities, tags) do not otherwise need to be decoded.

The --VALIDATION_STRINGENCY argument is an enumerated type (ValidationStringency), which can have one of the following values:

STRICT
LENIENT
SILENT

ValidationStringency  STRICT


--VERBOSITY / NA

Control verbosity of logging.

The --VERBOSITY argument is an enumerated type (LogLevel), which can have one of the following values:

ERROR
WARNING
INFO
DEBUG

LogLevel  INFO


--version / NA

display the version number for this tool

boolean  false


Return to top


See also General Documentation | Tool Docs Index Tool Documentation Index | Support Forum

GATK version 4.1.0.0 built at Wed, 30 Jan 2019 10:21:04 +0530.