Controlling access to Nomad clusters.

avatar

Posted by Yghor Kerscher - 17 May, 2018

There are many benefits to running your infrastructure in the cloud, as well as isolating your components from one another with containers in your DevOps infrastructure. However, when you reach a certain scale of operations, it becomes necessary to use tools that ensure you make best use of compute resources. These are commonly called application schedulers, and they allocate your compute units — be they containers, virtual machines, isolated forked processes — across multiple servers.

Nomad is one such tool, and in this blog post we will learn how to control access to it, by deploying a test cluster and configuring access control lists (ACLs). There are other schedulers available, such as Kubernetes, Mesos or Docker Swarm, but each has different mechanisms for securing access. By following this post, you will understand the main components in securing your Nomad cluster, but the overall idea is valid across any of the other schedulers available.

One of Nomad’s selling points, and why you could consider it over tools like Kubernetes, is that you can schedule not only containers, but also QEMU images, LXC, isolated fork/exec processes, and even Java applications in a chroot(!). All you need is a driver implemented for Nomad. On the other hand, its community is smaller than Kubernetes, so the tradeoffs have to be measured on a project-by-project basis.

Pre-requisites

  • Workstation able to run virtual machines
  • POSIX shell, such as GNU Bash
  • Vagrant > 2.0.1
  • Nomad demo Vagrantfile

We will run everything from within a virtual machine that comes with almost everything we need inside. In the examples that follow, $ represents your command shell prompt. Lines beginning with it are commands you’re supposed to run, ommiting the $. Once you have installed Vagrant, execute the following commands on your shell:

$ dir=$(mktemp --directory) && cd "${dir}"
$ curl -LO https://raw.githubusercontent.com/hashicorp/nomad/master/demo/vagrant/Vagrantfile
$ vagrant up
    # lines and lines of Vagrant output
    # this might take a while

$ vagrant ssh
    # Message of the day greeting from VM
    # Anything after this point is being executed inside the virtual machine

vagrant@nomad:~$ nomad version
Nomad vX.X.X

Notice how once you vagrant ssh, the prompt changes to vagrant@nomad:~$. On your system, and depending on the version of Vagrantfile used, maybe this will be slightly different, but it will almost assuredly be different than your usual command prompt. If in doubt, check uname -n. It should output nomad. If you have preferences on terminal multiplexers, editors or other additional tooling, this is the time to install it in the virtual machine. We’ll assume you’re running with the defaults below.

Minimal Nomad setup

ACL setup is slightly different than a -development environment for testing out Nomad. We will have to write our own agent configuration, but it should be very straightforward. Assume all commands below and from this point on are done only after you have vagrant ssh into your virtual machine from instructions above.

vagrant@nomad:~$ export EDITOR=pico # or your editor of choice, but let’s keep this simple
vagrant@nomad:~$ alias e=$"{EDITOR}"
vagrant@nomad:~$ e nomad-agent.conf

This sets up e a shorthand for editing files, and starts editing a file on ~/nomad-agent.conf. This is where we’ll configure the agent, which in this demo will execute both as server and client. On a production environment, you don’t really want to do this, but in our case it’ll be fine. We’ll keep things as simple as possible. Copy the following contents into the buffer where e nomad-agent.conf was executed:

bind_addr = "0.0.0.0"

data_dir = "/var/lib/nomad"

region = "global"

acl {
  enabled = true
}

server {
  enabled              = true
  bootstrap_expect     = 1
  authoritative_region = "global"
}

client {
  enabled = true
}

Save and exit the text editor. Leave this terminal window be for a second, but don’t close it. Now, open another terminal window on your host computer, and vagrant ssh from the same folder. Within it, execute:

vagrant@nomad:~$ sudo nomad agent -config=nomad-agent.conf

You should see output immediately. nomad is running! Because our agent is both a client and server, we have to run this as root. In a normal server situation we would run it rootless. Only clients need root access, mainly to be able to execute tasks. This sounds backwards, right? Usually servers need root and clients run under another user. The reason why this is different for Nomad is that servers are only communicating with each other the state of the cluster. Clients however need to execute compute jobs on their nodes, and this requires root access for most drivers — e.g. launching a QEMU image or a rkt container. Once your window has output coming from your demo instance, you should keep it open and ignore it for the moment.

ACL Bootstrap

We’re ready to bootstrap ACL now. Go back to the window where you edited the agent configuration and run the following command:

vagrant@nomad:~$ nomad acl bootstrap

Accessor ID  = 2f34299b-0403-074d-83e2-60511341a54c
Secret ID    = 9fff6a06-b991-22db-7fed-55f17918e846
Name         = Bootstrap Token
Type         = management
Global       = true
Policies     = n/a
Create Time  = 2018-02-14 19:09:23.424119008 +0000 UTC
Create Index = 13
Modify Index = 13

We have our first token, which as can be seen is valid globally and allows management access — that is, anything is permitted. There are still no policies, hence the N/A. Copy the Accessor ID and Secret ID somewhere you won’t lose, at least not until the end of the demo. For a production environment, you should store this safely and treat it with utmost care.

If you tried running any nomad command against our “cluster”, you will notice that you’re currently being denied. This is by design! Once ACLs are on, everything is denied unless you have a valid token, and the operation you want is allowed by a policy, or you have a management token. Try it yourself:

vagrant@nomad:~$ nomad node-status
Error querying node status: Unexpected response code: 403 (Permission denied)

vagrant@nomad:~$ export NOMAD_TOKEN='9fff6a06-b991-22db-7fed-55f17918e846' # Secret ID, above
vagrant@nomad:~$ nomad node-status

ID        DC   Name   Class   Drain  Status
1f638a17  dc1  nomad  <none>  false  ready

Designing policies

Now that we have a running agent with ACL enabled, we should think about what kind of policies we want to design. Ideally, we would like to have a collection of roles, more or less non-overlapping, that provide access to different operations in managing our cluster. Each of these roles is then a policy, and users must use tokens for policies of what they want to achieve

Policies

Role Namespace Agent Node
Anonymous deny deny deny
Developer write deny read
Logger list-jobs, read-logs deny read
Job requester submit-job deny deny
Infrastructure read write write

Note: For namespace access, read is equivalent to [read-job, list-jobs]. write is equivalent to [list-jobs, read-job, submit-job, read-logs, read-fs, dispatch-job].

The table above shows the typical users of our Nomad cluster. Users without a valid token are denied any operations. Since by default Nomad will deny anything without a policy, this is implicit and need not be added to our ACLs. Next we have development needs. Developers are able to send their jobs, listing jobs and so on, but they cannot mutate anything on running agents and nodes. This matches our intuition that development teams should be able to debug their applications and send applications to run, but they shouldn’t have access to cluster management itself.

The next two policies, “logger” and “job requester”, are meant for restricted development usage. Logger can be given as a policy to a system that captures and shows logs of jobs in an automated fashion. Job requester can be given to a CI to request creation of new jobs, but forbidding it to interact with running jobs elsewhere.

Lastly, infrastructure is meant for operations, where interacting directly with jobs is not necessary, only setting parameters for running agents and nodes. In the event that operations does need to have access to namespaces also, one can always create a token that has both Developer and Infrastructure policies attached. This is for all purposes equivalent to having a management token, however.

We have left out multi-region and multi-namespace setups here. We’ll assume everything is running under the default namespace. For this restricted scenario, this is quite OK. It should be noted, though, that on production deployments, with much larger needs, the policies below should be given per-namespace, and tracked between regions carefully.

Policy rules

As we seen before, policies are a collection of rules. Each of the policies above is expressed by a combination of rules. Nomad understands a JSON payload with the name and description of policy, plus a quoted JSON or HCL document with rules. To simplify things, we will write the policies in YAML and then convert them to the correct format that Nomad understands. That avoids error-prone quoting and makes it much faster to go from writing to sending it to an agent. Create a file infrastructure.json and edit the contents to match the ones below:

---
Name: infrastructure
Description: Agent and node management
Rules:
  agent:
    policy: write
  node:
    policy: write

This policy matches exactly what we have in the table above. Notice how we do not explicitly deny any items. By default, Nomad will reject anything not matching a policy, but the real reason why we avoid adding explicitly is that if we give a token two policies, deny will always overwrite any capability. That means if we added it here for Infrastructure, and later decided to give Developer to a token also, anything that was denied on either will affect the token as a whole, even if one of the documents grants a capability on that scope. Let’s convert our YAML policy into something Nomad understands:

vagrant@nomad:~$ sudo apt install jq golang --yes --quiet
vagrant@nomad:~$ mkdir go && export GOPATH="~/go" && export PATH="${GOPATH}/bin:${PATH}"
vagrant@nomad:~$ go get -v github.com/bronze1man/yaml2json
vagrant@nomad:~$ yaml2json < infrastructure.yaml | jq '.Rules = (.Rules | @text)' > infrastructure.json
vagrant@nomad:~$ cat infrastructure.json
{
  "Description": "Agent and node management",
  "Name": "infrastructure",
  "Rules": "{\"agent\":{\"policy\":\"write\"},\"node\":{\"policy\":\"write\"}}"
}

Notice how we had to quote Rules so that Nomad understands it. You might have a preferred way of converting from YAML to JSON. The example above is just one practical way among many.

Testing

Adding the policy

Now we need to add this policy to Nomad and test if it’s working. We’ll do that by creating the policy, then a token that has that policy, and finally attempting operations that shows our permissions are correctly set.

vagrant@nomad:~$ curl --request POST \
--data @infrastructure.json \
--header "X-Nomad-Token: ${NOMAD_TOKEN}" \
http://127.0.0.1:4646/v1/acl/policy/infrastructure

vagrant@nomad:~$ nomad acl policy list
Name            Description
infrastructure  Agent and node management

vagrant@nomad:~$ nomad acl policy info infrastructure
Name        = infrastructure
Description = Agent and node management
Rules       = {"agent":{"policy":"write"},"node":{"policy":"write"}}
CreateIndex = 425
ModifyIndex = 425

Creating a token

Great, the policy is there and correct. Now we will create a token that has the infrastructure policy. We will then attempt a few operations with it.

vagrant@nomad:~$ nomad acl token create \
-name='devops-team' \
-type='client' \
-global='true' \
-policy='infrastructure'

Accessor ID  = 927ea7a4-e689-037f-be89-54a2cdbd338c
Secret ID    = 26832c8d-9315-c1ef-aabf-2058c8632da8
Name         = devops-team
Type         = client
Global       = true
Policies     = [infrastructure]
Create Time  = 2018-02-15 19:53:59.97900843 +0000 UTC
Create Index = 432
Modify Index = 432

vagrant@nomad:~$ export NOMAD_TOKEN='26832c8d-9315-c1ef-aabf-2058c8632da8'
vagrant@nomad:~$ nomad status
Error querying jobs: Unexpected response code: 403 (Permission denied)

vagrant@nomad:~$ nomad node-status
ID        DC   Name   Class   Drain  Status
1f638a17  dc1  nomad  <none>  false  ready

As you can see, anyone with the devops-team token is being allowed to do operations on nodes, but not on jobs — i.e. on namespace resources. We have successfully achieved our intent, and can start creating the rest of the policies.

Where to go next

The example policy above is one among the ones we listed out under the policies section. You can probably guess that creating all the others is a trivial exercise, nevertheless, there is a repository with the policies ready for use. You should clone it and use it.

Related articles

Topics: DevOps Practice, Nomad, devops infrastructure, Nomad Clusters, application schedulers

How to Succeed in FinTech with 5 Killer DevOps Strategies

Recent Posts

Hackathon Review and Stack Maintenance

read more

DevOps to Prepare for a Blockchain World

read more

Controlling access to Nomad clusters

read more