Blog


Persisting Storage in Docker

  • July 15, 2018
  • Docker

Generally speaking, Docker containers should have everything they need access to baked into the image. There are times, however, that it may be necessary to have additional files or directories provided to the container to persist information. These can include, but are not limited to:

  • Configuration files
  • Data persistence (usually only for local databases for development)
  • Application package hotswap during development
  • Saving artifacts generated by the application

Docker has two ways to provide such storage: bind mounts and volumes.

Bind Mounts

Bind mounts are used to provide access to a directory on the host machine. On a Linux host, Docker allows you to bind a user defined directory into the root filesystem of the container, effectively allowing you to do the equivalent of mount –bind for your directly to link it directly to the container’s filesystem. This is ideal for providing custom configuration files or saving off build artifacts to your host directory. To mount a directory into a container, execute the following on an example container:

This creates a simple dummy site and then pulls down the  nginx image, running it with the content of our simple site. If you open your web browser to http://localhost, you will see the “Hello, world!” message that we left in our sample directory. Alternatively, instead of the  --mount option, you can use the older style -v syntax:

It is recommended that you use the  --mount  option as it is more precise in its definition. The -v  option is only still available for legacy purposes.

We can inspect the container and see that the mount is defined for our container by using docker inspect test :

We can specify bind mounts in a compose file as such:

You can also map an individual file onto a container, but it is rare to do so. If using the -v syntax, note that if the file is missing, a directory will be created with the name of the file that you specify. This can be confounding if you use this in a Compose file. More can be found on the Docker website:

https://docs.docker.com/storage/bind-mounts/

Bind mounts on Docker for Mac do not use native bind mounting, but instead uses osxfs to attempt to provide a near-native experience for bind mounts. It is still slower than a native bind mount running on Linux, but should still work seamlessly with local HFS+ filesystems. By default, it only has access to the /Users, /Volumes, /private, and /tmp directories. See official documentation details on Docker’s official website:

https://docs.docker.com/docker-for-mac/osxfs/

Docker Volumes

Docker volumes are file system mounts that are managed completely by the Docker engine. Historically, these have been called “named volumes,” just in case you see a reference to it in literature or in command line help or error messages. When a Docker volume is created, the directory is stored in the /var/lib/docker/volumes/  directory. The typical use case for a named volume would be for something like data persistence or sharing data between containers. Let’s dig out the Pastr app from the first tutorial. We’ll add the mount in the docker-compose.yml  file:

The top-level volumes directive (at the bottom of the snippet) denotes that a datastore shall be created via this compose file. After starting the container with docker-compose up -d the Docker engine will create the pastrdatastore volume.

It creates it a volume in the  /var/lib/docker/volumes  directory. This volume is mounted to the database container, which we see when we inspect it with docker inspect pastr_database_1 .

Note that on a Linux machine, this volume exists on the native filesystem. However, on a Windows or Mac system, this volume exists within the virtual machine; you can’t access it directly, nor should you try to, even on a Linux machine. If you need to mount the data store to inspect its contents, you can run it with  docker run -it --rm --mount source=pastr_pastrdatastore,destination=/mnt ubuntu /bin/bash.

For more details, please see the official Docker documentation:

https://docs.docker.com/storage/volumes/#start-a-service-with-volumes




Native Docker Linux vs Hypervisor Docker for Mac and Windows

Before I start delving further into Docker tutorials, I feel that I should go over the differences between Docker running natively on Linux versus running Docker on virtual machines on Mac and Windows.

Docker for Linux (Native)

Natively, Docker runs on Linux, taking advantage of direct access of the host Linux kernel. You can prove this by running the following:

The second line is just a weird quirk of the entrypoint option (read more here). But it still shows us that the kernel that your application thinks its running on is actually the kernel of the host machine.

Let’s also take a look at the network stack.

I stripped out a lot of extraneous information but kept in the important bits. When you look at your network interfaces, you’ll see your normal loopback and ethernet, but you’ll also notice a veth device that wasn’t there before. This device has the same MAC address as the one assigned to the Docker container, as well as the same IP address. You can actually ping this IP address or reach the open port (172.17.0.2:80) in your web browser without having to do any port forwarding.

Docker for Mac

When running Docker on macOS, if we try to look up the kernel, we get the following:

Well, that’s not what we were looking for. You can clearly see that the kernel isn’t the same. What’s actually happening is that Docker for Mac is spinning up a virtual machine. It uses the built-in macOS Hypervisor framework, allowing an application to run virtualized processes with a rather lightweight overhead. The hypervisor runs, as you can see, LinuxKit, which was created by the folks at Docker to build lightweight Linux distribution to run the Docker engine. As such, you can set, via the notification indicator menu preferences, set the VM settings, allocating the appropriate number of cores and amount of memory.

What this means is that with Docker for Mac, you do not have access to the network stack, nor do you have native file mounts. If you mount a local host directory to your container, you can expect your application to run about four times slower than if you baked the contents of that directory into the image or used a named volume.

The advantage that Docker for Mac has over the older Docker Toolbox method is that instead of having to pass commands via a TCP connection to a port on the VirtualBox instance, information is passed along a much speedier and more reliable Unix socket. See Docker’s official documentation on Docker for Mac for more details: https://docs.docker.com/docker-for-mac/docker-toolbox/

Docker for Windows

Docker for Windows operates much the same way as Docker for Mac. It utilizes Hyper-V to spin up a hardware virtualization layer and run LinuxKit. It has similar limitations to the Docker for Mac installation. Additionally, you will also have to enable file sharing in the Docker for Windows settings for the drives you want. You will also need to make sure your firewall will allow connections from the Docker virtual machine to the host Windows system. See the following links for more detail:

https://docs.docker.com/docker-for-windows/install/
https://success.docker.com/article/error-a-firewall-is-blocking-file-sharing-between-windows-and-the-containers




Local Development with Docker

Alright, I wish I could take back my previous Docker entry. It was pretty useless, so I’m going to take another shot at this at do it right. I’ve given this talk about Docker about a dozen times in person, done a recorded (sadly, proprietary) teaching session on it, but I still find myself giving it over and over again, so I thought it might be best to just start writing it down. The target audience for this is people who have only really heard of Docker without knowing what it is. At the end of this guide, you should be able to write your own Dockerfile for your project and deploy it locally for testing purposes.

What is Docker?

You can think of Docker as yet another layer of virtualization, one that’s not as heavyweight as full hardware virtualization or paravirtualization. It’s a level known as “operating-system-level virtualization,” where the guest machine shares the same kernel as the host, but gets its own file system to itself and network stack. This allows you to run your application as a process on the host operating system while fooling the guest application into thinking that it has all of its own resources to use.

What should I use it for?

Docker makes it easy to spin up multiple stateless application services onto a cluster. If anything requires storage, e.g. a database, it is much better to use a standard virtual machine with dedicated mounted storage. Docker is not designed to manipulate stored data very efficiently.

Installation and Example

The first step, obviously, is to install Docker. Follow the directions here and find your platform.

After you have it installed, we’ll get a quick “Hello, World!” going. We’ll execute two lines, docker pull hello-world and docker run hello-world.

The first line pulls down an image from hub.docker.com and the second instantiates a container from that image and runs it. Now, this could all be done with the run command, but I broke it out into different steps to show two separate steps. The first is to obtain the image, while the second is to create a container from that image.

We’ll take a look at the two separately with docker images and docker container.

We see that we have an image that we downloaded from the hub. We also have a container created using said image. It’s assigned a randomly generated name nervous_joliot because we didn’t bother naming it. You can name your containers when you run them with the –name directive, e.g docker run –name my_hello_world hello-world.

Images vs. Containers

Let’s go into more detail on what images and containers are as they pertain to Docker.

Images

Images are immutable data layers that contain portions of the filesystem needed to run your application. Typically, you would start with the base operating system image, add language/library support, then top it off with your application and ancillary files.

Each image layer starts by declaring a base image to inherit from. Notice that earlier, when you were pulling the hello-world image, it was downloading four images. It was downloading not only the hello-world image layer, but also all the layers that it depends on. We’ll cover this more in depth later on.

Containers

Containers are instantiated instances of images that can support execution of the installed application. You can think of images as a class definitions in Object Oriented Programming, an containers are analogous as objects. You can create multiple containers from the same image, allowing you to spin up a cluster of processes with a few simple commands.

When a container is created, a new read/write layer is introduced on top of the the existing image layers. If a change is made to a file existing in an image layer, that file is copied into the container read/write layer while the image is untouched.

Creating Dockerfiles

A Dockerfile is a build descriptor for a Docker image, much like a Makefile is used to build your application (if you still write C-code). You would typically include you Dockerfile inside your project, run your regular project artifact build, and then run, either manually or via a build target (make docker or mvn -Pdocker, etc) to produce your Docker image.

For this example, we’ll take a look at Pastr, a quick and dirty PasteBin clone I wrote with Python and a Redis storage backend. You can clone the project from here: https://gitlab.com/ed11/pastr.

The project uses Flask and Flask-Restful to serve up data stored from a connected Redis database presented with a VueJS UI front-end. (At the time of this writing, it’s still… very much lacking in quality; this was just the quickest thing I could slap together for a demo). The application just spins up a Flask WSGI development server for simplicity’s sake.

Let’s take a look at the Dockerfile to see what we’re building:

We’ll break this down line by line, remembering that each line creates its own image layer, as visualized earlier in the Images section.

FROM

This line tells the Docker engine to start our image off by pulling the python base image from the official repository (hub.docker.com). The 3.6 after the colon tells it that we want specifically version 3.6 of Python. This is a tag for the image. You can specify tags as a point release for your application or combine it with other text to mean variants (e.g. myapp:1.0-debug to indicate that the image runs your application in debug mode).

ADD

This command copies the contents of the pastr directory (in the current project working directory) into the image at /opt/. Note there are special rules on what ADD does. I recommend reading the documentation on the official Docker website:

https://docs.docker.com/engine/reference/builder/

COPY

This command copies a single file (the requirements.txt file) into the /opt directory. If you’re still in doubt on what to use, USE COPY instead of ADD.

RUN

This command starts up a temporary container from the previous image layers, pops open a shell inside the virtual file system, and then begins executing commands. In this case, it simply runs the pip install command, which, in a Python project, downloads all the required libraries needed to execute the application. You would normally use this to download third party dependencies, extract tarballs, or change permissions of files to grant execute privileges. After the command is done, it takes the mutable file system layer created by the container and saves it off as an immutable image layer.

Be very mindful of the layer saving when using the RUN command when dealing with large files. For example, if you use this to download a large executable from a third party resource and then change the permissions, you will end up with two layers of the same size. Example:

Say our large-binary-executable is 500MB. The first command will save off an image where the file is not executable, taking up 500MB. The second command will take the 500MB file, change the permissions, and save another image where the 500MB file is executable, essentially taking up 1GB of disk space for anyone who has to download it. Instead, you should run them in one command, like so:

CMD

The CMD directive specifies the command that is to be executed when the container starts up. In our example, we run the python command and point it to our application. The DB_SERVER=$DB_SERVER is an environment variable that we pass to our application as a rudimentary form of configuration management.

There are actually two ways to specify the container startup command: the CMD and the ENTRYPOINT directives. In most cases, these might be interchangeable, but there are nuanced differences on which to use, which are more suitable for a more advanced topic. For now, I will say that semantically, ENTRYPOINT is generally used to specify the executable and CMD is used to pass in parameters. The latter can be overridden on the command line prior to starting up.

Building the Image

Using the Dockerfile, we can build the image manually with the following command:

What this command does is build the image using the current working directory (specified by the trailing dot) and naming it pastr as indicated by the -t directive. We can validate that the image is created by checking the image list.

Typically, this would be handled by your build script using a build target plugin, as mentioned earlier.

Running the Container

We run the container much like we did with our hello-world example above.

A breakdown of the flags:

  • –detach run the application in the background and return the console back to the user.
  • –rm When the container exits, remove it so it does not linger.
  • –name The name to assign it. If omitted, a random one is generated and assigned.
  • –publish Expose the port on the container, binding it to localhost. In this case, localhost:5000 on your computer will forward to port 5000 of the container.
  • pastr The name of the image to base the container off.

From here, we can open a browser up to localhost:5000 to view the application.

Of course, if you try typing in anything into the text area and submit, you’ll get an error indicating that it can’t connect to the database. So we’ll have to run a separate Redis database. Let’s kill off our existing container.

Now, we’ll start our Redis database back end server, using the official redis image.

With the Redis instance running, we can create our Pastr application and point it to the database.

You’ll note that we added a few things to the argument list.

  • –link directs the Docker engine to allow communication between this container and the paste_db container, which is the Redis instance we started earlier.
  • –env sets the environment variable the application uses to specify the database server. This is what we specified in the CMD line in the Dockerfile.

From here, we can try again, this time actually pushing the save button.

It works, end to end, now! Refresh the page and click on the drop down again to see your stored text (bugfix forthcoming).

The problem is, how do we keep track of all the flags that we had to use to get it running?

Docker Compose

Docker Compose is an orchestration tool that allows you to create and run Docker containers using a pre-configured YAML file. Let’s look at our compose file.

The version field is just so that the docker-compose command knows what API set to use. Our application can be found under services. You’ll notice that we have two, the pastr app and the backend database. You also may recognize the fields underneath as things we put in the command line to run our containers.

We are already familiar with image, ports (which we called publish), environment, and links. We’ll focus on some of the newer things.

  • build the directory to use to build the image if the image does not exist. The build will name it the same as the service, which in this case is pastr.
  • depends_on this directive will instruct the Docker engine to launch database before it starts up pastr. Note that it will only affect the orders which containers start, not necessarily wait until the other container application has fully started.

If you haven’t already, now would be a good time to bring down the other containers, as they will conflict with what we are about to do.

We’ll start by building the pastr image using the docker-compose command.

From here, we can start up the entire application, including the database.

Again, we use the -d flag to detach and run all of our containers in the background. If you ever wish to see the log output of a container, simply run docker-compose logs <container-name>.

To shut it all down, issue the stop command.

You can also stop and remove individual containers as well as restart containers with the stop, remove, and restart commands. Give them a try!

Conclusion

We have seen what Docker virtualization is and how to run containers manually and through orchestration. In the future, we will learn other things we can do to make local development easier, such as using network bridges and proxies to access multiple containers via the same port.




Explain this to me as if I was a small child – Developer Test Driven Development

There are so many ways to do testing. It all depends on the technology, the framework, and what automation tools your organization is willing to invest in. Automated testing is a great way to assert sanity in your build, especially when multiple people are touching the same segments of code. Two main approaches to testing here are Behavior Driven Development, which is usually done for acceptance testing, and Test Driven Development, which is used by developers to validate specification. This write-up will cover the latter.

So you’ve been told to write unit tests with a Test Driven Development (TDD) approach. What does that mean? You’ve read that you write a little code, and test a little code. You may have read you write all you tests first (daunting!). Sounds foreign to you, so you say, “screw it, I’ll use the debugger like I usually do.” Well, if you write good unit tests, you probably won’t ever need a debugger, and in the chance you actually need to run the debugger, it’ll get you to what you need much quicker.

How do you do it? Enter the xUnit the framework. xUnit is simply a loose collection of testing frameworks modeled off of the original SUnit framework used with SmallTalk. Just about every language has a clone of this framework you can use, e.g. jUnit for Java and Test::Class for Perl. In this example, we’ll be using unittest.py for Python.

So let’s get to some code. Let’s say we’re trying to write a rudimentary encryption library. We’ll use something stupid easy, like ROT13. (Note, don’t ever use ROT13 as an encryption format). Let’s start by creating a blank class.

Now at this point, believe it or not, you’re ready to write your first test. Why would you write a test for something that doesn’t have much content? There are many reasons. First, it’s to make sure you spelled things right. Sure, in the age of IDEs, this isn’t supposed to happen, but it can if you have to make a one-off change in a text editor. Second, it can teach you things about a new language, like, “Does this language provide a default constructor?”

And third, which I think is most important, this allows you to define the API, or at least put down some semblance of a specification that you’re given. The interface should, more often than not, be dictated by how the library is going to be used, not by the data store or back end technology that it wraps.

Let’s start with the basic test stub and add a basic test. We’ll say that we want to at least be able to instantiate a ROT13 object and print out the ciphertext.

Our custom test class inherits from unittest.TestCase, which gives us access to a bunch of other methods we’ll come across later. The test method, starting with test, will get executed every run. The final line in the main body simply executes all test cases inheriting TestCase. Let’s run it to see what happens.

Hmm, okay, it passed. This tells us two things: Python provides us with a default constructor, and it also provides a default string representation. Now, the default string output is not quite what we wanted. Let’s define our interface such that we have a constructor that takes in a plaintext, and our string representation gives us the ciphertext. We’ll keep our old test to make sure our old behavior still works, but we’ll add a new test to confirm our new behavior:

Running our test again, we’ll see that it fails.

Now, we can go back to our class and an optional parameter so that we can satisfy both tests.

Running our tests again, we’ll see that we get passed the first error, but are presented with another error:

(As a side note, in writing this, since I’ve been living in Java-land for the past year, I forgot that the constructor in Python is the __init__() method, and not ROT13(). Having a unit test is a quick way to remind me that I’m doing it wrong.)

In looking at the output, we still need to correct our str() output. Well, let’s correct that real quick.

I know… this is so cheating, but it works! If you run your tests again, all will pass. At some point in your professional life, you will find yourself doing this, whether due to time constraints or to some loose requirement that you don’t quite understand how to implement. This is why you have these tests and multiple versions of these test to make sure what you’re testing works as intended.

Let’s clean it up and make it work with anything, and not just with AAAA. What we’ll do is take each letter in the plaintext, add 13 to the ordinal number of the letter, then convert that ordinal number back into a character using UTC encoding. We can do it in one line.

Running this will also cause our tests to pass. Great!

However, this will only work with letters up to M. What happens when we do something N and beyond?

Well, that’s to be expected. We’ll have to add some code to account for the wraparound after ‘Z’. Let’s introduce a helper method that translates one character at a time, and we’ll have our __str__() method still do a list comprehension and call our helper method. The helper method will simply check to see

Running our tests again, we’ll see that it passes.

Now, let’s try our hand at lower case letters. Our new test will have mixed case, with a good representation from both before and after the m-n midpoint:

Running this will give us interesting results:

It looks like we just need to add to our helper method to do the same thing with lowercase n through z characters.

This will now allow our test to pass.

What about non-alpha characters? We’ll stipulate that non-alpha characters should just pass through unaltered, so that a space in the plaintext is still a space in the ciphertext.

Before our change, we’ll see… not what we wanted.

We’ll put a guard around the block of code in our helper method to keep out all non-alpha characters from being manipulated.

Now, our tests all pass. We have the behavior we want.

This tutorial will end here, but there are many other directions you can go. You can add other methods to continue testing. You can also decide to refactor the helper class to do dictionary lookups instead of integer calculation. Once you change up the helper method, you should be able to run the tests without changing them to confirm that you have the exact same behavior as you did before you do your refactor.

The unittest framework also provides other helpful assertions, like null condition checking and truth validation. It also provides helper methods for setting up and tearing down each test case or each class load.

Remember, the idea of this iterative approach is to first establish your core functionality and slowly add in edge cases as you go, so you’re not overwhelmed with testing every branch of you code all at once. There should be at least one test case per branch.

The complete code for this demo can be found on my Github account. Hope this helps you better understand how to do TDD!




Algorithms: Date Range Coalescing

I’ve done a lot of odd jobs in the software business over the years, including, but not limited to, automated UI testing, database management, package maintainer, network administration, system administration, SELinux/SEAndroid policy writing, and download script writing. My favorite thing to do, though, is algorithm writing. I don’t get to do much of it, but I have fond-yet-frustrating memories of sitting on the floor of my cubical with a pad of paper furiously drawing out ideas so I can get them out of my head and put them into code.

One of the early ones I did was a simple algorithm to coalesce a group of date ranges. The application for this algorithm would be for any situation where you needed to group blocks of time together, such as for a calendar scheduler. For example, if you need to put in a new appointment on the calendar of several, and you needed to see all available spots or simply find the first available time slot that all parties are available.




Docker and the Local Development Environment

I remember when I started my first job, I had to go through the new-guy day 1 setup fish bowl. You know, where the other developers say, “whoa-ho-ho, you’re going to have a fun first week,” and then proceed to take bets on how long it will take you to set up your own local test environment. First step, install your database system, create your database, and provision users. Then, it’s installing all the prerequisite libraries: Apache, PHP, Python, etc. Then, clone your repository, and edit all your configuration files to point instances of your local code.  Edit all your configuration files to use the right database, and pray that it works. Of course, it doesn’t ever work on your first try, and you have to go ask the guy next to you to stop what he’s doing and come over to help you out. This could take anywhere from 3 days to two weeks.

Then, we had virtual machines. Someone builds a virtual image with all the libraries you need and the database pre-installed, and you could just get it up and running. It was nicer to get you started up with, but it ate up all your system resources. It’s also a huge thing to download on day one, so you usually get it via sneaker-net, which the IT department frowns upon. Also, syncing your code for testing quick fixes is non-trivial. You have to either copy it on to the system or work out some sort of hypervisor-specific shared mount.

A co-worker of mine suggested that we run the development environment in its own chroot jail. It allows you to run the application server on a production like userland environment (as opposed to your native laptop experience) like you would on a virtual machine, but without having the overhead of running a whole different operating system. It works well once you get it up and running, but getting it up and running is the difficult part. The base filesystem can be distributed as a tarball, but the bootstrap script would need to be tailored to your specific Linux distribution for bind mounts. Oh, and your host operating system MUST be Linux.

Enter Docker.

Docker containers operate much like a chroot jail, where your images are your userland filesystems, and you don’t need special scripts to bootstrap the container; it’s all built in to the engine. On top of that, it’s even paired up with its own virtual network stack, so it behaves like its own virtual machine. To test quick fixes, your code is mounted on a volume handled by the Docker engine. Also, to test multiple services interacting with each other, the containers can be put onto the same virtual network bridge.

The best part? You can then take these Docker containers and ship them off to your production deployment.

Now go download it and play around with it.




Explain This To Me As If I Were a Small Child – Python Interactive Interpreter

I used to work with a guy who would, when trying to understand the mess of a design I just made, ask me to (facetiously), “explain it to me as if I were a small child.” I never liked teach things that way because it always felt insulting to the target audience. However, I sometimes find myself in situations where people I’m teaching say the following:

Him: “Wait, I didn’t know you could do that!”

Me: “Oh, I just thought you knew.”

Now I’m not insulting, I’m a terrible teacher.

So, let’s start from the beginning, as if you were a small child.

Python is an interpreted language. It means that the script that you write isn’t compiled directly into machine code. It get sent into an interpreter that then translates it into machine code on the fly. We’ll come back to this in a second.

What happens when you want to jump in and try a language? Usually, you start with your favorite editor (hopefully not Notepad), write your generic “Hello, World!” program, load that file into the interpreter, and marvel at what your fingertips hath wrought.

Let’s do one better. Let’s launch the Python interactive interpreter.

Woah, what’s this? It’s an interactive Python shell. You can try Python code out live. Go ahead. Try it.

There. Hello, world, without having to touch Notepad. (You should really consider a better text editor).

What’s the point of this? You can, of course, learn about the language and try out new reserved keywords. It’s also good for trying out new libraries, seeing the exact format a function returns.

When you’re working with a new library, it would be nice to know what classes are in a package, or what methods are in a class. Traditionally, you would call dir() to see what is available.

It would be nice if we could get some tab completion in here. Enter ipython. You can get it via apt, yum, or pip.

Tab completion!

Now get out there and start exploring.




Fits and Starts

There used to be a few more words here, but I’m still getting used to the new digs. I managed to screw up the installation again, so, here we are again.

I used to have big plans, where I would write things that I’ve learned over the years, especially during the graduate school days. Write up on things like linear time sorting, graph coloring, and other things that I have since forgotten.

Although it may be too late to start those things, it’s not to late to just start. It’s a good time to start with all the Git stuff I’m sure to forget soon. Perhaps a little Python as well.