In this document we'll go over some basic strategies to investigate failed workflows on FireCloud. This isn’t a guide for solving all errors but a doc to instruct new users on how to diagnose failed submissions. Descriptions of more complicated errors are always welcomed on the FireCloud Forum where our team is happy to help.
At this point we will assume that you have a workspace setup with a data model and method configuration loaded. You’ve launched your method, but your submission has failed. Don’t despair! There is some information you can gather that will be helpful in getting your method up and running.
In your workspace, go to the Monitor tab and click
View to visit the submission that failed. Then click
View again in the data entity column to see details about the submission.
Click “Show” located to the the right of Failures, this will display any error messages. The message listed under
Failures isn't always short and sweet, and, if interpreted incorrectly, will lead you down the wrong debugging path. Instead, use the message to identify and investigate which task failed. In this case, the failed task is Hello_GATK.HaplotypeCaller_GVCF .
Show on the task that failed to gain access to three very useful files for debugging: stderr, stdout, JES log. These files are generated by Cromwell when executing any task and are placed in the task's folder along with its output. In FireCloud we add quick links to these files in the Monitor tab to make troubleshooting easier.
Many common task level errors are indicated in the stderr file. Click on the link to the stderr.log file, in this example it would be
Haplotypecaller_GVCF-stderr.log and a window will appear giving you a glimpse of the file.
Here we see an error produced by the HaplotypeCaller command in our task. The message indicates the index file for the FASTA reference does not exist. Hmm, its seems there's something wrong with the FASTA index we provided. Click
Done to go back to the previous screen. Now we’ll check the inputs that were provided to the task by clicking
Show for the Inputs.
Ah-ha, the reference index file we provided (InputBamIndex) is the index file for our sample (NA12878.bai), but instead it should be the index for reference hg38 (Homo_sapiens_assembly38.fasta.fai). After correcting my method configuration to use the right input index, my workflow passes!
Often the JESLog is difficult to decipher so its better to proceed to the other log files. However, in some cases your submitted job will fail with no stderr or stdout files. In these cases you’ll have to suck it up and unravel the meaning behind the JESLog messages. Below we’ve provided some common JESLog errors and their possible meaning as aid. There isn’t a solution for all of them so feel free to post your error on the FireCloud forum so the team could help you through the message.
PAPI (JES) Error 10 Message 15
PAPI (JES) Error 10 Message 14
maxRetriesto retry transient failures.
PAPI Error 10 Message 13
PAPI Error 5: Message 10
PAPI Error 2
Cannot find credentials for RawlsUser(RawlsUserSubjectId(******),RawlsUserEmail(******))Refresh your browser window, and you will see a yellow banner at the top of the page that says "Your offline credentials are missing or out-of-date. Your workflows may not run correctly until they have been refreshed. Refresh now...", click the link to "Refresh now..." and follow the prompts. This will update your credentials in the system and should make this error message go away. This prompt is needed in order to maintain compliance with certain security standards required for hosting sensitive data.