Alright, I wish I could take back my previous Docker entry. It was pretty useless, so I’m going to take another shot at this at do it right. I’ve given this talk about Docker about a dozen times in person, done a recorded (sadly, proprietary) teaching session on it, but I still find myself giving it over and over again, so I thought it might be best to just start writing it down. The target audience for this is people who have only really heard of Docker without knowing what it is. At the end of this guide, you should be able to write your own Dockerfile for your project and deploy it locally for testing purposes.
What is Docker?
You can think of Docker as yet another layer of virtualization, one that’s not as heavyweight as full hardware virtualization or paravirtualization. It’s a level known as “operating-system-level virtualization,” where the guest machine shares the same kernel as the host, but gets its own file system to itself and network stack. This allows you to run your application as a process on the host operating system while fooling the guest application into thinking that it has all of its own resources to use.
What should I use it for?
Docker makes it easy to spin up multiple stateless application services onto a cluster. If anything requires storage, e.g. a database, it is much better to use a standard virtual machine with dedicated mounted storage. Docker is not designed to manipulate stored data very efficiently.
Installation and Example
The first step, obviously, is to install Docker. Follow the directions here and find your platform.
After you have it installed, we’ll get a quick “Hello, World!” going. We’ll execute two lines, docker pull hello-world and docker run hello-world.
$ docker pull hello-world
Using default tag: latest
latest: Pulling from library/hello-world
9bb5a5d4561a: Pull complete
Status: Downloaded newer image for hello-world:latest
$ docker run hello-world
Hello from Docker!
This message shows that your installation appears to be working correctly.
To generate this message, Docker took the following steps:
1. The Docker client contacted the Docker daemon.
2. The Docker daemon pulled the "hello-world" image from the Docker Hub.
3. The Docker daemon created a new container from that image which runs the
executable that produces the output you are currently reading.
4. The Docker daemon streamed that output to the Docker client, which sent it
to your terminal.
To try something more ambitious, you can run an Ubuntu container with:
$ docker run -it ubuntu bash
Share images, automate workflows, and more with a free Docker ID:
For more examples and ideas, visit:
The first line pulls down an image from hub.docker.com and the second instantiates a container from that image and runs it. Now, this could all be done with the run command, but I broke it out into different steps to show two separate steps. The first is to obtain the image, while the second is to create a container from that image.
We’ll take a look at the two separately with docker images and docker container.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hello-world latest e38bc07ac18e 2 months ago 1.85kB
$ docker container ls -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
79530d8a293c hello-world "/hello" 38 minutes ago Exited (0) 38 minutes ago nervous_joliot
We see that we have an image that we downloaded from the hub. We also have a container created using said image. It’s assigned a randomly generated name nervous_joliot because we didn’t bother naming it. You can name your containers when you run them with the –name directive, e.g docker run –name my_hello_world hello-world.
Images vs. Containers
Let’s go into more detail on what images and containers are as they pertain to Docker.
Images are immutable data layers that contain portions of the filesystem needed to run your application. Typically, you would start with the base operating system image, add language/library support, then top it off with your application and ancillary files.
Each image layer starts by declaring a base image to inherit from. Notice that earlier, when you were pulling the hello-world image, it was downloading four images. It was downloading not only the hello-world image layer, but also all the layers that it depends on. We’ll cover this more in depth later on.
Containers are instantiated instances of images that can support execution of the installed application. You can think of images as a class definitions in Object Oriented Programming, an containers are analogous as objects. You can create multiple containers from the same image, allowing you to spin up a cluster of processes with a few simple commands.
When a container is created, a new read/write layer is introduced on top of the the existing image layers. If a change is made to a file existing in an image layer, that file is copied into the container read/write layer while the image is untouched.
A Dockerfile is a build descriptor for a Docker image, much like a Makefile is used to build your application (if you still write C-code). You would typically include you Dockerfile inside your project, run your regular project artifact build, and then run, either manually or via a build target (make docker or mvn -Pdocker, etc) to produce your Docker image.
For this example, we’ll take a look at Pastr, a quick and dirty PasteBin clone I wrote with Python and a Redis storage backend. You can clone the project from here: https://gitlab.com/ed11/pastr.
The project uses Flask and Flask-Restful to serve up data stored from a connected Redis database presented with a VueJS UI front-end. (At the time of this writing, it’s still… very much lacking in quality; this was just the quickest thing I could slap together for a demo). The application just spins up a Flask WSGI development server for simplicity’s sake.
Let’s take a look at the Dockerfile to see what we’re building:
ADD pastr /opt/pastr
COPY requirements.txt /opt/
RUN pip install -r /opt/requirements.txt
CMD DB_SERVER=$DB_SERVER python /opt/pastr/__init__.py
We’ll break this down line by line, remembering that each line creates its own image layer, as visualized earlier in the Images section.
This line tells the Docker engine to start our image off by pulling the python base image from the official repository (hub.docker.com). The 3.6 after the colon tells it that we want specifically version 3.6 of Python. This is a tag for the image. You can specify tags as a point release for your application or combine it with other text to mean variants (e.g. myapp:1.0-debug to indicate that the image runs your application in debug mode).
This command copies the contents of the pastr directory (in the current project working directory) into the image at /opt/. Note there are special rules on what ADD does. I recommend reading the documentation on the official Docker website:
This command copies a single file (the requirements.txt file) into the /opt directory. If you’re still in doubt on what to use, USE COPY instead of ADD.
This command starts up a temporary container from the previous image layers, pops open a shell inside the virtual file system, and then begins executing commands. In this case, it simply runs the pip install command, which, in a Python project, downloads all the required libraries needed to execute the application. You would normally use this to download third party dependencies, extract tarballs, or change permissions of files to grant execute privileges. After the command is done, it takes the mutable file system layer created by the container and saves it off as an immutable image layer.
Be very mindful of the layer saving when using the RUN command when dealing with large files. For example, if you use this to download a large executable from a third party resource and then change the permissions, you will end up with two layers of the same size. Example:
RUN wget http://my-file-server/large-binary-executable
RUN chmod +x large-binary-executable
Say our large-binary-executable is 500MB. The first command will save off an image where the file is not executable, taking up 500MB. The second command will take the 500MB file, change the permissions, and save another image where the 500MB file is executable, essentially taking up 1GB of disk space for anyone who has to download it. Instead, you should run them in one command, like so:
RUN wget http://my-file-server/large-binary-executable && chmod +x large-binary-executable
The CMD directive specifies the command that is to be executed when the container starts up. In our example, we run the python command and point it to our application. The DB_SERVER=$DB_SERVER is an environment variable that we pass to our application as a rudimentary form of configuration management.
There are actually two ways to specify the container startup command: the CMD and the ENTRYPOINT directives. In most cases, these might be interchangeable, but there are nuanced differences on which to use, which are more suitable for a more advanced topic. For now, I will say that semantically, ENTRYPOINT is generally used to specify the executable and CMD is used to pass in parameters. The latter can be overridden on the command line prior to starting up.
Building the Image
Using the Dockerfile, we can build the image manually with the following command:
What this command does is build the image using the current working directory (specified by the trailing dot) and naming it pastr as indicated by the -t directive. We can validate that the image is created by checking the image list.
$ docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
pastr latest 6635be8bc083 4 seconds ago 941MB
Typically, this would be handled by your build script using a build target plugin, as mentioned earlier.
Running the Container
We run the container much like we did with our hello-world example above.
docker run --detach --rm --name pastr1 --publish 5000:5000 pastr
A breakdown of the flags:
- –detach run the application in the background and return the console back to the user.
- –rm When the container exits, remove it so it does not linger.
- –name The name to assign it. If omitted, a random one is generated and assigned.
- –publish Expose the port on the container, binding it to localhost. In this case, localhost:5000 on your computer will forward to port 5000 of the container.
- pastr The name of the image to base the container off.
From here, we can open a browser up to localhost:5000 to view the application.
Of course, if you try typing in anything into the text area and submit, you’ll get an error indicating that it can’t connect to the database. So we’ll have to run a separate Redis database. Let’s kill off our existing container.
$ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
adf918f846ef pastr "/bin/sh -c 'DB_SERV…" About an hour ago Up About an hour 0.0.0.0:5000->5000/tcp pastr1
$ docker stop pastr1
Now, we’ll start our Redis database back end server, using the official redis image.
docker run --detach --rm --name pastedb redis:latest
With the Redis instance running, we can create our Pastr application and point it to the database.
$ docker run --detach --rm --name pastr1 --publish 5000:5000 --link pastedb --env DB_SERVER=pastedb pastr
You’ll note that we added a few things to the argument list.
- –link directs the Docker engine to allow communication between this container and the paste_db container, which is the Redis instance we started earlier.
- –env sets the environment variable the application uses to specify the database server. This is what we specified in the CMD line in the Dockerfile.
From here, we can try again, this time actually pushing the save button.
It works, end to end, now! Refresh the page and click on the drop down again to see your stored text (bugfix forthcoming).
The problem is, how do we keep track of all the flags that we had to use to get it running?
Docker Compose is an orchestration tool that allows you to create and run Docker containers using a pre-configured YAML file. Let’s look at our compose file.
The version field is just so that the docker-compose command knows what API set to use. Our application can be found under services. You’ll notice that we have two, the pastr app and the backend database. You also may recognize the fields underneath as things we put in the command line to run our containers.
We are already familiar with image, ports (which we called publish), environment, and links. We’ll focus on some of the newer things.
- build the directory to use to build the image if the image does not exist. The build will name it the same as the service, which in this case is pastr.
- depends_on this directive will instruct the Docker engine to launch database before it starts up pastr. Note that it will only affect the orders which containers start, not necessarily wait until the other container application has fully started.
If you haven’t already, now would be a good time to bring down the other containers, as they will conflict with what we are about to do.
docker stop pastr1 pastedb
We’ll start by building the pastr image using the docker-compose command.
docker-compose build pastr
From here, we can start up the entire application, including the database.
$ docker-compose up -d
Creating network "pastr_default" with the default driver
Creating pastr_database_1 ... done
Creating pastr_pastr_1 ... done
Again, we use the -d flag to detach and run all of our containers in the background. If you ever wish to see the log output of a container, simply run docker-compose logs <container-name>.
$ docker-compose logs pastr
Attaching to pastr_pastr_1
pastr_1 | * Serving Flask app "__init__" (lazy loading)
pastr_1 | * Environment: production
pastr_1 | WARNING: Do not use the development server in a production environment.
pastr_1 | Use a production WSGI server instead.
pastr_1 | * Debug mode: on
pastr_1 | * Running on http://0.0.0.0:5000/ (Press CTRL+C to quit)
pastr_1 | * Restarting with stat
pastr_1 | * Debugger is active!
pastr_1 | * Debugger PIN: 133-983-541
To shut it all down, issue the stop command.
$ docker-compose down
Stopping pastr_pastr_1 ... done
Stopping pastr_database_1 ... done
Removing pastr_pastr_1 ... done
Removing pastr_database_1 ... done
Removing network pastr_default
You can also stop and remove individual containers as well as restart containers with the stop, remove, and restart commands. Give them a try!
We have seen what Docker virtualization is and how to run containers manually and through orchestration. In the future, we will learn other things we can do to make local development easier, such as using network bridges and proxies to access multiple containers via the same port.