Earlier this week, I made a big deal about how we plan to develop all of our GATK tutorials as Jupyter Notebooks in Terra going forward. Today I'd like to offer you a concrete look at what we like about using notebooks for GATK tutorials.

I was planning to just walk you through a couple of notebooks in one of our workshop workspaces, but then decided to make a custom workspace and notebook to show you what I mean without the complexity of the full-length tutorials. It's part highlights, featuring a couple of my favorite tutorial scenarios from the workshops that are fairly simple yet quite effective, and part sneak preview of the newest version of the tutorials, which boast cool new features and will be unveiled at the next workshop (Cambridge in July). Oh, and part explainer on what exactly are Jupyter Notebooks anyway?

Overall you can consider this mini-tutorial a stepping stone to being able to use the workshop tutorial workspaces without needing to actually attend a workshop. The workspace docs and the notebook itself both have a lot of explanations about how things work and how to use them in your pursuit of deeper understanding of GATK. So I don't feel the need to go on and on about it here (for once). But I will mention, in case you're on the fence about whether to spend 5 whole minutes checking out the workspace (add 15 to 20 minutes to actually work through the full notebook), it involves running GATK commands, streaming files, and viewing data in IGV -- all without ever leaving the warm embrace of the notebook.

Actually I lied, I will go on a bit because there are two standout features that I want to call explicitly. One is Python Magic, which allows us to run commands as if we were in the terminal, but from within the flow of the notebook itself. If you thought you could only run Python code in there, think again! You can run anything that you can install on the notebook runtime (which is just about anything). You can also use it to embed R code, which comes in handy in one of our filtering tutorials, because we love Python as a home base but make extensive use of the R library ggplot. (Or you can switch the entire notebook to an R kernel on the fly but that leads to some nervousness about state so I'd rather use the magic, personally.)

The other waffle-worthy feature is IGV integration: you can embed an interactive IGV window to view and explore your data directly from within the notebook. Until very recently we had to load files into desktop IGV, which involved a lot of copy-pasting of cloud storage file paths, and some context switching. With embedded IGV there's none of that. It's not as full-featured as the desktop version (and sometimes you may still prefer to use desktop IGV), but the notebook integration has practically all the functionality I ever use. And it's just so cool to have what amounts to embedded interactive figures right there with the rest of the commands and explanations. Seriously, I love the IGV integration so much, it's hard to put into words.

All this to say, I heartily recommend you check out this mini-tutorial workspace, as it will give you a very concrete set of examples of how we're building out our tutorials and empower you to work through our workshop workspaces on your own. And as always we'd love to get feedback from all of you about the current crop of tutorials and what you'd like us to prioritize next.

Go to http://app.terra.bio and you'll be asked to log in with a Google identity. If you don't have one already, you can create one, and choose to either create a new Gmail account for it or associate your new Google identity with your existing email address. See this article for step-by-step instructions on how to register if needed. Once you've logged in, look for the big green banner at the top of the screen and click "Start trial" to take advantage of the free credits program. As a reminder, access to Terra is free but Google charges you for compute and storage; the credits (a $300 value) will allow you to try out the resources I'm describing here for free. To clone a workspace, open it, expand the workspace action menu (three-dot icon, top right) and select the "Clone" option. In the cloning dialog, select the billing project we created for you with your free credits. The resulting workspace clone belongs to you. Have fun!

Return to top

Comment on this article

- Recent posts

- Upcoming events

See Events calendar for full list and dates

- Recent events

See Events calendar for full list and dates

- Follow us on Twitter

GATK Dev Team


@wbsimey Happy to hear you’ve found the resources we provide helpful!
30 Jul 19
New crop of GATK workshop videos now available on YouTube! Updated for the GATK4/2019 version of the Best Practices… https://t.co/Wfgq5YKBFg
25 Jul 19
Don't miss this #GATK workshop -- we've got a great crew lined up and the location isn't half bad either :) https://t.co/b0fL8ZLwzn
23 Jul 19
@Brunods1001 It’s been updated to use GATK4, which addresses the invalid bam output issue that affected the GATK3 v… https://t.co/AUlbjmHKmm
11 Jul 19
Wrapping up the #GATK workshop in Cambridge, UK -- it's been a blast. Great group of participants and fantastic hos… https://t.co/bvwGTU7lYq
11 Jul 19

- Our favorite tweets from others

In spite of their stated mission to support human health through genomics, many GATK pipelines are applicable to no… https://t.co/FKQTouZjbv
29 Jul 19
Me: driving myself insane over what data to keep and what to not bother with for thesis and also frantically trying… https://t.co/er2klIcw5i
18 Jul 19
@RareSeas first attempt at teaching the GATK course, do I look puzzled up there? https://t.co/4mqkHbWJy4
11 Jul 19
Can you spot CDGP PhD student, Dr. Alice Denyer, brushing up on the latest bioinformatics tools from @gatk_dev? The… https://t.co/KAbdlWLbcb
10 Jul 19
GATK workshop materials available online! Learn it in your own time with @ProjectJupyter notebooks. ^MT https://t.co/IKDa6SGwaU
8 Jul 19

See more of our favorite tweets...