Wednesday, July 31, 2024

Scaling Smarter: Understanding the Intricacies of HPA in Kubernetes

My 2 minutes on Kubernetes Horizontal Pod Autoscaler (HPA) and how it can be tricky to get it right 

Scaling Behavior

In my experience, scaling up generally works well because higher resource demands are quickly met. However, scaling down doesn't always occur as expected when resource usage decreases. This aspect often isn't included in performance testing criteria. Engineers and QA teams typically test for scale-up and performance requirements but may not thoroughly test elasticity.

Let's consider a simple java application but this can be applied to any app that has memory limits 

Are you using a Service Mesh ( This is fun ! ) 

When using services using sidecars, such as those injected by service meshes like Istio, there are several important considerations:

Resource calculation

HPA calculates resource usage across all containers in a pod, including sidecars. This means the total CPU and memory usage will include both your application container and the sidecar.

Adjust target utilization

Due to the additional resource consumption by sidecars, you may need to adjust your target CPU or memory utilization percentage. For example, if your original target was 80%, you might need to lower it to account for the sidecar's resource usage.

Metrics 

If you want to scale based on metrics from a specific container (e.g., your application container) rather than the entire pod, you can use container resource metrics. This allows you to ignore the sidecar's resource usage when making scaling decisions. So do you really need this then ?

When you need more fine-grained control, you can use custom metrics to scale your pods. This is especially useful if you want to scale based on application-specific metrics rather than just CPU or memory usage.

Handling High Memory Usage

Suppose you have configured the correct memory settings, but encounter high memory usage due to a specific rollout or an untested use case. The first response from an SRE is usually to increase the memory allocation. If you have an Xmx value specified, it also needs to be increased. I believe it's better not to set an Xmx value at all. The same principle applies to the Xms value when scaling down.

Its needles to say avoid hardcoding values for Xmx and Xms. Instead, drive these values from a configuration file, such as values.yaml. 

Testing it out

I created an application that consumes memory when API is called , jut for testing this out . Minikube environment with networking is not the perfect setup , but you can get this working in most of the high-end laptops 

Github

Containerization Github List  ->   Github Containers

No comments:

Post a Comment

Should You Use Containers with Serverless Implementations?

Serverless implementations are usually Cloud Deployment-specific, AWS Lambda being the most popular implementation. However, there are some ...