Containers 101

From BitFolk
Jump to navigation Jump to search

Containers are a hot topic with plenty of resources out there. This is an attempt to gather the salient points.

The basics

Think of a container as a lightweight form of virtual machine.

You can run multiple containers on a single VM, or indeed on a physical machine.

Containers are self-contained. A container's contents are a filesystem. This could be anywhere from a single statically-linked binary, to a full Linux desktop environment.

You can run multiple processes within a container. They cannot see processes outside of the container.

Configuration

When you run a container you specify:

  • the container image
  • what binary in the container you want to run (if not the container's specified default)
  • any environment variables to pass in to that binary
  • any secrets to pass in
  • networking & filesystem setup (see below)

Networking

Containers are by default isolated, running on a virtual network that is not reachable from the internet (it's within RFC1918 space). They are effectively behind a NAT firewall provided by the host machine.

You get to specify:

  • its name (usually a service name like 'db' or 'www')
  • what (if any) TCP ports it should open (everything else will be firewalled)
  • whether these TCP ports should be accessible by the outside world, or only within that machine's container virtual network (default: container network only)

Host filesystem access

You also get to mount directories from the host machine into the container. This can include binding in Unix sockets. (This is how you might set up a container that can start and stop other containers, by talking to the control socket.)

You get to choose whether the host machine directories are read-only or read-write.

This mechanism can be a source of security holes; if your container is compromised over the network, the attacker might try to escape the container.

Why would I want to use containers?

Your application will likely have multiple moving parts, each of which would be a separate binary on a traditional server. (e.g.: web server, web application, database.)

The current trend is strongly towards microservices. Larger applications are being split up into fleets of smaller applets. The idea is that each microservice has one job and does it well.

  1. Security. By the principle of least privilege, keep the various moving parts isolated from each other. That way if one is compromised, it's harder for an attacker to get very far.
  2. Flexibility. Application components are updated regularly in the face of security advisories. With a well-oiled stack of containers, if a point release for one part comes out, all you have to do is plug in an updated version of a single container. This is generally easier with containers, and it's easy to roll back in case of trouble.
  3. Scalability.
    1. You can quickly and easily set up a clone of your environment someplace else for testing.
    2. If or when your needs grow, moving one of your services out onto another host is pretty straightforward.

There are downsides. Going to containers forces you to think about all this stuff. The interfaces between the various moving parts have to be well defined.

Also Samba cannot work in a container, because it fights with samba in other containers. There is a work round - see https://serverfault.com/questions/810544/samba-daemon-does-not-work-as-systemd-service-but-works-in-foreground/862514#862514 However, my experience is that windows cannot open Samba shares on containers. YMMV

Important points

Containers should be ephemeral. That is to say, your application should be able to survive the container being destroyed and a new container image started in its place without data loss.

Therefore:

You need to think carefully about where your application's data and config live. They do not belong in the container image. They should either be stored on a directory mounted in from the host filesystem, or in a containerised storage service with a similar guarantee of persistence (e.g. a local database container, or a storage service on the cloud). Of course, if your app's main config is in a storage service, you need to configure the application to know where to find that storage service and give it suitable credentials.

Never store any secrets in your container image. They should either live in persistent storage or be injected via the secrets mechanism. Using the secrets mechanism lets you give different credentials to your test and live service instances.

A simple example

You might run nginx in a container to serve up some static web pages.

This container might listen to the outside world on ports 80 and 443.

  • The container's webroot is a directory mounted in from the host filesystem.
  • The nginx configuration (/etc/nginx/) is in a different directory on the host filesystem.
  • The container needs a TLS private key. This is a secret, injected via the secrets mechanism. (In a single container system this might as well be a directory from the host fs. In multi-machine clouds life gets more complex.)
  • If you're running Let's Encrypt you may need to add some writeable persistent storage so the acme challenge mechanism works.

Additional definitions

Docker is a container platform. That is to say, it is cross-platform software you can use to build and run containers. There is a large repository of Docker containers available on the Docker Hub. Like any open ecosystem, you should be wary of containers that are not official curated images; they are more likely to have security issues.

Alpine is a minimal Linux distribution that is very commonplace in the Docker world. You are likely to encounter it if you get into building your own containers.

Orchestration is the service that runs one (or more) containers and sets up the virtual networking for you. docker-compose is a straightforward orchestration system.

LXC (Linux Containers) is a container runtime for Linux. It leans more towards kernel-free but otherwise full VMs, whereas Docker leans more towards application images.

Kubernetes (k8s for short) is a broad term for a family of orchestration services. Typically k8s technology is used to set up an fleet of containerised apps spanning multiple machines, which may be a mixture of on-premises and in-cloud, with load balancing, high availability and elastic scaling to meet user demand. There are a lot of pieces to this puzzle; k8s has a reputation for being fiendishly complex. Because of this, there are many related product offerings out there. Some of them are reportedly pretty good.

Further reading