When deploying apps to Kubernetes (K8s) one thing to get right in your deployments are the requests & limits for both CPU & Memory (RAM). Understanding the differences between requests & limits, the consequences of the values you set etc are all very important things for you to understand in the K8s world.
Throughout this blog post we will use the following example app as a reference point (this isn’t valid K8s yaml - just pseudo-code):
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 2
Memory: 1Gi
CPU resources are defined above using full cores as they have no suffix. Fractions can be achieved using Millicores (thousandths of a core). For example
200m
would be one fifth of a core. You can be granular down to a single millicore (1m
) if desired.
Memory resources as you can see are defined using “binary prefixes” instead of “decimal prefixes” (based on multiples of 1024 instead of 1000).
Let’s start with some simple questions.
Requests are the guaranteed amount of resources that K8s will ensure are available for your app. So when DemoApp1
requests 1 CPU & 512Mi, it will be scheduled by K8s to a node that has enough capacity available to meet these requests.
Limits describe the total amount of resources your app can burst up to. In our case DemoApp1
can burst up to twice it’s request values if needed. There is a very important thing to note here: K8s does NOT guarantee that the limit values will be available to an app on the node that it has been scheduled to.
No doubt all this raises several questions, so lets try to answer them.
Q. What happens if no node has enough resources to meet my app’s requests when I deploy it?
A. Assuming you are running the cluster-autoscaler (you are right?) it will spin up a new node (or multiple nodes if required) to meet the demand.
Q. What happens if no node has enough resources to meet my app’s limits when I deploy it?
A. Limits are not evaluated when K8s is looking to deploy your app to a node. K8s will only evaluate your request values.
Q. So what happens if my app needs to burst “into” it’s limit values?
A1. One of two things will happen. If there are enough resources available on the node that your app is deployed to then your app will be able to burst up to those limits as expected. It’s interesting to note here that these extra resources may be “taken” from other app’s request resources that they are not currently using.
A2. If there are no available resources then your app will be “cut off” at that point. If some but not all the limit resources available your app can make use of them of course.
Q. Does anything special happen if the limit resources are not available for my app to burst into?
A. For CPU: no. Your app is simply throttled at the request value or the amount it was able to burst up to. For memory however, K8s will terminate the pod your app is running in with an OOMKilled
message.
In the answers above we got a couple of interesting little tidbits about how limits work. Lets explore that because it’s useful to understand the mechanics at work here.
First of all as mentioned briefly above, when K8s schedules your app to a node, limits are ignored. It is scheduled based on it’s request values only. So it is perfectly possible to be deployed to a node that can meet the app’s request values but would never be able to meet it’s limit values.
This does raise the question though: if limit values are not evaluated at deploy time, then where does K8s get the resources to allow apps to burst at all? Again, this is alluded to above. When an app runs, K8s dynamically updates the CPU/Memory resources assigned to it in order to meet it’s current requirements. This means that even though DemoApp1
is requesting 1 CPU for example, if it is currently idling at 150m
then that is all K8s will assign to that app. This means that when your app isn’t using all of the resources that it requested, then another app deployed to the same node may use these currently unused resources to burst into, up to it’s own limit values.
What is more interesting here is that as we know K8s guarantees that it can meet your app’s request values. So when DemoApp1
eventually gets off it’s backside and does some work and starts using those additional CPU resources, what happens (assuming the node is not under subscribed of course)? Well K8s dynamically updates the resources assigned to DemoApp1
to meet it’s current needs. It does this by taking back the resources that the other app is currently using to burst. Obviously if the other app still needs those burst resources it may then encounter an error as previously discussed.
Some may assume that hitting a limit would trigger the cluster-autoscaler
but it does NOT. The cluster-autoscaler
only evaluates required resources at deployment time. This is a key reason why getting your requests & limits correct is so crucial.
This image from AWS (see original blog post containing this image here) really succinctly demonstrates the requests/limits relationship in a very simple way:
Not the popular chat platform. Slack is an important concept in K8s, many blog posts refer to it, and it is important to understand for some of the scenarios we’ll take a look at next.
Slack space (or sometimes cost) is the amount of “wasted” resources assigned to your apps. Slack is basically the difference between the resources assigned to your app and what it is actually using. There are many tools (assuming you don’t have your own custom graphs in something like Grafana) that shows this slack space such as kube-resource-report.
Many blog posts that talk about reducing your K8s costs will talk about slack space. They will also often tell you to get rid of all this slack space without any thought as to the potential consequences of that. Yes that’s right, the situation is more nuanced than many people admit to and blindly “getting rid of” all your slack space may actually be a very bad thing. We’ll see how sometimes slack space is a good thing in the scenarios we look at.
Now we will take this knowledge and look at some scenarios we might find ourselves in and compare the benefits & risks of each.
To get the “full picture” make sure you take a look at our values and the key facts for each scenario as these will differ in subtle ways at times.
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 2
Memory: 1Gi
The key facts for this scenario:
Benefits:
Risks:
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 1.2
Memory: 768Mi
The key facts for this scenario:
Benefits:
Risks:
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 1
Memory: 512Mi
The key facts for this scenario:
Benefits:
Risks:
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 2
Memory: 1Gi
The key facts for this scenario:
Benefits:
Risks:
DemoApp1:
Requests:
CPU: 1
Memory: 512Mi
Limits:
CPU: 1
Memory: 512Mi
The key facts for this scenario:
Benefits:
Risks:
As we can see from the scenarios, getting the requests & limits right for your app is as crucial as it is subtle and dare I say, just a little complicated. We saw that slack space, so looked down upon by many can have the benefit of helping prevent situations where your app is throttled or killed at the cost of consuming extra resources (and upping your AWS bill!).
Undoubtedly getting the right values will take a little trial and error. The TL;DR of this for me is:
Of course this doesn’t cover other situations that will influence how your app responds to traffic. Your app’s deployment is likely to be autoscaled via the Horizontal Pod Autoscaler for example. This increases the number of replicas of your app across the cluster based on certain pre-defined resource thresholds. Obviously this impacts how your app behaves in terms of resource usage and the consequences of bursting into your limit values etc. This is why testing these values is key.
Hopefully though this has been useful food for thought and will enable you to make more informed decisions about your resource settings from now on.