The topics of authentication and authorization usually appear simple but turn out to hide significant complexity. That's because, at its core, auth is all about answering two questions:
However, the devil is in the details. Seasoned IT professionals, software developers, and even typical end users are fairly accustomed at this point to many of the most common requirements and pain points around auth.
Cloud authentication and authorization is not drastically different from non-cloud systems, at least in principle. However, there are a few things about the cloud and its common use cases that introduce some curve balls:
This blog post series is going to focus on the full picture of authentication and authorization, focusing on a cloud mindset. There is significant overlap with non-cloud systems in this, but we'll be covering those details as well to give a complete picture. Once we have those concepts and terms in place, we'll be ready to tackle the quirks of individual cloud providers and commonly used tooling.
We're going to define authentication as proving your identity to a service provider. A service provider can be anything from a cloud provider offering virtual machines, to your webmail system, to a bouncer at a bartender who has your name on a list. The identity is an equally flexible concept, and could be "my email address" or "my user ID in a database" or "my full name."
To help motivate the concepts we'll be introducing, let's understand what goals we're trying to achieve with typical authentication systems.
Once we know the identity of something or someone, the next question is: what are they allowed to do? That's where authorization comes into play. A good authorization provides these kinds of features:
In simple systems, the two concepts of authentication and authorization is straightforward. For example, on a single-user computer system, my username would be my identity, I would authenticate using my password, and as that user I would be authorized to do anything on the computer system.
However, most modern systems end up with many additional layers of complexity. Let's step through what some of these concepts are.
A basic concept of authentication would be a user. This typically would refer to a real human being accessing some service. Depending on the system, they may use identifiers like usernames or email addresses. User accounts are often times given to non-users, like automated processes or Continuous Integration (CI) jobs. However, most modern systems would recommend using a service account (discussed below) or similar instead.
Sometimes, the user is the end of the story. When I log into my personal Gmail account, I'm allowed to read and write emails in that account. However, when dealing with multiuser shared systems, some form of permissions management comes along as well. Most cloud providers have a robust and sophisticated set of policies, where you can specify fine-grained individual permissions within a policy.
As an example, with AWS, the S3 file storage service provides an array of individual actions from the obvious (read, write, and delete an object) to the more obscure (like setting retention policies on an object). You can also specify which files can be affected by these permissions, allowing a user to, for example, have read and write access in one directory, but read-only access in another.
Managing all of these individual permissions each time for each user is tedious and error prone. It makes it difficult to understand what a user can actually do. Common practice is to create a few policies across your organization, and assign them appropriately to each user, trying to minimize the amount of permissions granted out.
Within the world of authorization, groups are a natural extensions of users and policies. Odds are you'll have multiple users and multiple policies. And odds are that you're likely to have groups of users who need to have similar sets of policy documents. You could create a large master policy that encompasses the smaller policies, but that could be difficult to maintain. You could also apply each individual policy document to each user, but that's difficult to keep track of.
Instead, with groups, you can assign multiple policies to a group, and multiple groups to a user. If you have a billing team that needs access to the billing dashboard, plus the list of all users in the system, you may have a
BillingDashboard policy as well as a
ListUsers policy, and assign both policies to a
BillingTeam group. You may then also assign the
ListUsers policy to the
There's a downside with this policies and groups setup described above. Even if I'm a superadmin on my cloud account, I may not want to have the responsibility of all those powers at all times. It's far too easy to accidentally destroy vital resources like a database server. Often, we would like to artificially limit our permissions while operating with a service.
Roles allow us to do this. With roles, we create a named role for some set of operations, assign a set of policies to it, and provide some way for users to assume that role. When you assume that role, you can perform actions using that set of permissions, but audit trails will still be able to trace back to the original user who performed the actions.
Arguably a cloud best practice is to grant users only enough permissions to assume various roles, and otherwise unable to perform any meaningful actions. This forces a higher level of stated intent when interacting with cloud APIs.
Some cloud providers and tools support the concept of a service account. While users can be used for both real human beings and services, there is often a mismatch. For example, we typically want to enable multi-factor authentication on real user accounts, but alternative authentication schemes on services.
One approach to this is service accounts. Service accounts vary among different providers, but typically allow defining some kind of service, receiving some secure token or password, and assigning either roles or policies to that service account.
In some cases, such as Amazon's EC2, you can assign roles directly to cloud machines, allowing programs running on those machines to easily and securely assume those roles, without needing to store any kinds of tokens or secrets. This concept nicely ties in with roles for users, making role-based management of both users and services and emerging best practice in industry.
The system described above is known as Role Based Access Control, or RBAC. Many people are likely familiar with the related concept known as Access Control Lists, or ACL. With ACLs, administrators typically have more work to do, specifically managing large numbers of resources and assigning users to each of those per-resource lists. Using groups or roles significantly simplifies the job of the operator, and reduces the likelihood of misapplied permissions.
Most modern DevOps platforms have multiple systems, each requiring separate authentication. For example, in a modern Kubernetes-based deployment, you're likely to have:
That's in addition to maintaining a company's standard directory, such as Active Directory or G Suite. Maintaining this level of duplication among user accounts is time consuming, costly, and dangerous. Furthermore, while it's reasonable to securely lock down a single account via MFA and other mechanisms, expecting users to maintain such information for all of these systems securely is unreasonable. And some of these systems don't even provide such security mechanisms.
Instead, single sign-on provides a standards-based, secure, and simple method for authenticating to these various systems. In some cases, user accounts still need to be created in each individual system. In those cases, automated user provisioning is ideal. We'll talk about some of that in later posts. In other cases, like AWS's identity provider mechanism, it's possible for temporary identifiers to be generated on-the-fly for each SSO-based login, with roles assigned.
Deeper questions arise about where permissions management is handled. Should the central directory, like Active Directory, maintain permissions information for all systems? Should a single role in the directory represent permissions information in all of the associated systems? Should a separate set of role mappings be maintained for each service?
Typically, organizations end up including some of each, depending on the functionality available in the underlying tooling, and organizational discretion on how much information to include in a directory.
What we've covered here sets the stage for understanding many cloud-specific authentication and authorization schemes. Going forward, we're going to cover a look into common auth protocols, followed by a review of specific cloud providers and tools, specifically AWS, Azure, and Kubernetes.
Do you like this blog post and need help with DevOps, Rust or functional programming? Contact us.