Docker - container - image - registry
Dictionary | Created 2018-01-20 | Last updated 2018-03-05


Comments (0)

A container is something quite similar to a virtual machine, which can be used to contain and execute all the software required to run a particular program or set of programs. The container includes an operating system (typically some flavor of Linux) as base, plus any software installed on top of the OS that might be needed. This container can therefore be run as a self-contained virtual environment, which makes it a lot easier to reproduce the same analysis on any infrastructure that supports running the container, from your laptop to a cloud platform, without having to go through the pain of identifying and installing all the software dependencies involved. You can even have multiple containers running on the same machine, so you can easily switch between different environments if you need to run programs that have incompatible system requirements. Docker is one of several brands of container systems, produced by the Docker company. There are other brands such as Singularity, but Docker is the most popular and widely used. Sometimes we say "a docker" instead of "a container"; it's like when xerox became a regular noun name for copy machines due to the dominance of the Xerox company. However docker with a lowercase "d" is also the command-line program that you install on your machine to run Docker containers. We'll get back to that in a little while.

A container is packaged as an image. Note that this has nothing to do with pictures; here the word "image" is used in the same software-specific way that refers to a special type of file. You know how sometimes when you need to install new software on your computer, the download file is called a "disk image"? That's because the file you download is in a format that your operating system is going to treat as if it was a physical disk on your machine. This is basically the same thing. Another way to distinguish between an image and container is to think of the image as a snapshot of the container, that is not running.

An image can be distributed through one or more registries , which are repositories where users can store images privately or publicly in the cloud. Docker Hub is where Broad teams publish most of their docker images here). There are others, like Dockstore, which is specifically geared toward bioinformatics, and GCR, which is Google's general-purpose container registry for use on the Google Cloud Platform.


Using Docker

So one way to use Docker, let's say on your laptop, goes like this: you tell the docker program to download a container image (=file) from a registry, e.g. Docker Hub, then you tell it to initialize the container, which is conceptually equivalent to booting up a virtual machine. And once the container is running, you can run any software inside of it that is installed on its system. For a concrete example, see this Tutorial.

The other way to use Docker is on a cloud-based platform, like FireCloud. Methods in FireCloud use Docker to distribute tools and applications. By referencing Docker images in a Method configuration, anyone in the workspace can launch the same analysis without having to worry about having the exact same environment or applications downloaded. If you are concerned with privacy, access to Docker images can be set through the registry. For example, if you want private images to be used in Docker Hub, add "firecloud" as a Collaborator so that it can pull the private image.


Getting a Docker's image digest :

Say I want to get the digest for my_repo/my_image:tag. There are two ways to get it, and in both cases I'll be looking for something that looks like sha256:something_long, where the something_long bit is the digest.

1. If the image is not on my computer

I can just do docker pull my_repo/my_image:tag and the digest will be displayed in the output as:

Digest: sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce

2. If it is on my computer

I can use docker inspect instead but the output is more complicated:

~ $ docker inspect my_repo/my_image:tag
[
    {
        "Id": "sha256:a98acb9802cbf46eb71e28c652f58026c027d9580ff390c6fa9ae4dec07ae13d",
        "RepoTags": [
            "my_repo/my_image:tag"
        ],
        "RepoDigests": [
            "my_repo/my_image@sha256:96bf2261d3ac54c30f38935d46f541b16af7af6ee3232806a2910cf19f9611ce"
        ],

...and a lot of other details we don't care about right now.

Note that in the latter case there are two things that look like sha256:something_long. The one you want is the "RepoDigests" one, not the "Id".

Then in your WDL, you write my_repo/my_image@sha256:something_long. Note that the tag isn't there at all, as it's been replaced by the digest, which is a more specific identifier.


Return to top Comment on this article