Cookbook: Running Genome STRiP on the Amazon cloud
This document explains one method to install and configure GenomeSTRiP to run on the Amazon cloud.
Step 1: Sign up for an Amazon EC2 account (see http://aws.amazon.com/ec2/)
Step 2: Install the StarCluster Amazon cluster management software on your local network (see http://star.mit.edu/cluster/)
Step 3: Create a StarCluster config file
Generate a new Amazon EC2 keypair. As an example, to generate a keypair called gs- keypair, run
starcluster createkey gs-keypair -o ~/.ssh/rsa-gs-keypair
Create a StarCluster directory (by default, ~/.starcluster) and download the Genome STRiP sample StarCluster config file by running
wget ftp://ftp.broadinstitute.org/svtoolkit/aws/config -P ~/.starcluster
Then edit required information as described in Step 4
Alternatively, create a StarCluster default config file from scratch (see http://star.mit.edu/cluster/docs/latest/manual/configuration.html). By default, it will create the file named ~/.starcluster/config
The GenomeSTRiP plugin for StarCluster is used to dynamically install a specific version of GenomeSTRiP when a cluster gets launched. Download and install this by running
wget ftp://ftp.broadinstitute.org/pub/svtoolkit/aws/SVToolkitInstaller.py -P ~/.starcluster/plugins/
Step 4: Configure StarCluster
Modify the [aws info] section of the config file to fill in the AWS credentials information for your EC2 account.
Add the keypair section for the EC2 keypair you have created:
[key gs-keypair] KEY_LOCATION=~/.ssh/rsa-gs-keypair
Add a section describing the SVToolkit plugin:
[plugin gs-plugin] SETUP_CLASS = SVToolkitInstaller.SetupClass email = your@email SVVersion=
You have to fill in the e-mail address that you provided during Genome STRiP web site registration.
If you don't specify SVVersion, the plugin will download and install the latest available SVToolkit version.
Add a section describing the Genome STRiP cluster:
[cluster gs-cluster] KEYNAME = gs-keypair PLUGINS = pkginstaller,gs-plugin
Save the changes to the StarCluster config file.
Step 5: Launch a 1-node Amazon test cluster
Launch a cluster that will consist of the master node only, by running:
starcluster start gs-cluster -s 1
After this command completes, you can login to the master node by running:
starcluster sshmaster gs-cluster
The SVToolkit version you selected should be installed in /home/svtoolkit/.
Step 6: Check the Genome STRiP installation
You can verify the Genome STRiP installation using
java -jar ${SV_DIR}/lib/SVToolkit.jar
which will print version information.
You can also run the installtest on the cluster to further validate the installation.
See SVToolkit Recipies for more details on how to run various pipelines.
Step 7: Working with the cluster
Before running large analyses, you might want to logout from the master node and add more nodes to the cluster. For example, running
starcluster addnode -3 gs-cluster
will add 3 more nodes to gs-cluster. These additional instances are automatically added to the GridEngine host list, so the next time you login to the master node and run a pipeline that submits jobs to GridEngine all the instances in the cluster will be available to accept these jobs.
After you are done using the cluster you should terminate it by running
starcluster terminate gs-cluster