How to increase Pods limit per worker node in Kubernetes
Kubelet is close to pod limit
Issue:
The Alert Manager reported that our Kubelets running on the production nodes were running too many pods, close to the default limit of 110.Solution:
First We need to take a look at Grafana dashboard to see if the nodes are able to receive more pods in terms of infrastructure resources (CPU/RAM, ....), then calculate the average of this resource utilization in order to determine the new value of pods limit on each node.
We suppose that the chosen value for the pods limit on nodes is 200, you can change it following these steps :
1. Connect:
First connect to the node (the worker with the issue):
$ ssh -i /path/to/your/ssh-key/id_rsa node-user@node-hostname
node-user@node-hostname:~$ sudo -i
$ ssh -i /path/to/your/ssh-key/id_rsa node-user@node-hostname
node-user@node-hostname:~$ sudo -i
root@node-hostname:~#
2. Edit:
Change the value KUBELET_MAX_PODS in the file /etc/default/kubelet from 110 to 200 ( use your favorite file editor to change the value):
root@node-hostname:~# cat /etc/default/kubelet
KUBELET_CLUSTER_DNS=10.0.0.10
KUBELET_API_SERVERS=https://10.240.255.15:443
KUBELET_IMAGE=yassinemaachi/hyperkube-amd64:v1.7.7
KUBELET_NETWORK_PLUGIN=kubenet
KUBELET_MAX_PODS=200
DOCKER_OPTS=
CUSTOM_CMD=/bin/true
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS=kubernetes.io/role=agent,agentpool=kubepool1,node=kube
KUBELET_POD_INFRA_CONTAINER_IMAGE=gcrio.azureedge.net/google_containers/pause-amd64:3.0
KUBELET_NODE_STATUS_UPDATE_FREQUENCY=10s
KUBE_CTRL_MGR_NODE_MONITOR_GRACE_PERIOD=40s
KUBE_CTRL_MGR_POD_EVICTION_TIMEOUT=5m0s
KUBE_CTRL_MGR_ROUTE_RECONCILIATION_PERIOD=10s
KUBELET_IMAGE_GC_HIGH_THRESHOLD=85
KUBELET_IMAGE_GC_LOW_THRESHOLD=80
KUBELET_FEATURE_GATES=--feature-gates=Accelerators=true
root@node-hostname:~# cat /etc/default/kubelet
KUBELET_CLUSTER_DNS=10.0.0.10
KUBELET_API_SERVERS=https://10.240.255.15:443
KUBELET_IMAGE=yassinemaachi/hyperkube-amd64:v1.7.7
KUBELET_NETWORK_PLUGIN=kubenet
KUBELET_MAX_PODS=200
DOCKER_OPTS=
CUSTOM_CMD=/bin/true
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS=kubernetes.io/role=agent,agentpool=kubepool1,node=kube
KUBELET_POD_INFRA_CONTAINER_IMAGE=gcrio.azureedge.net/google_containers/pause-amd64:3.0
KUBELET_NODE_STATUS_UPDATE_FREQUENCY=10s
KUBE_CTRL_MGR_NODE_MONITOR_GRACE_PERIOD=40s
KUBE_CTRL_MGR_POD_EVICTION_TIMEOUT=5m0s
KUBE_CTRL_MGR_ROUTE_RECONCILIATION_PERIOD=10s
KUBELET_IMAGE_GC_HIGH_THRESHOLD=85
KUBELET_IMAGE_GC_LOW_THRESHOLD=80
KUBELET_FEATURE_GATES=--feature-gates=Accelerators=true
3. Restart & check:
Restart the kubelet service in the node and check if it is properly started:
root@node-hostname:~# systemctl restart kubelet.service
root@node-hostname:~# systemctl status kubelet.service
● kubelet.service - Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-03-22 13:52:42 UTC; 5min ago
Process: 63755 ExecStartPre=/sbin/iptables -t nat --list (code=exited, status=0/SUCCESS)
Process: 63750 ExecStartPre=/sbin/ebtables -t nat --list (code=exited, status=0/SUCCESS)
Process: 63747 ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_retries2=8 (code=exited, status=0/SUCCESS)
Process: 63742 ExecStartPre=/bin/mount --make-shared /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 63734 ExecStartPre=/bin/bash -c if [ $(mount | grep "/var/lib/kubelet" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi (code=exited, status=0/SUCCESS)
Process: 63728 ExecStartPre=/bin/mkdir -p /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 63722 ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh (code=exited, status=0/SUCCESS)
Main PID: 70046 (docker)
Tasks: 7
Memory: 4.0M
CPU: 945ms
CGroup: /system.slice/kubelet.service
└─70046 /usr/bin/docker run --net=host --pid=host --privileged --rm --volume=/dev:/dev --volume=/sys:/sys:ro --volume=/var/run:/var/run:rw --volume=/var/lib/docker/:/var/lib/docker:rw --volume=/var/lib/kubelet/:/var/lib/kubelet
root@node-hostname:~# systemctl restart kubelet.service
root@node-hostname:~# systemctl status kubelet.service
● kubelet.service - Kubelet
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Active: active (running) since Fri 2019-03-22 13:52:42 UTC; 5min ago
Process: 63755 ExecStartPre=/sbin/iptables -t nat --list (code=exited, status=0/SUCCESS)
Process: 63750 ExecStartPre=/sbin/ebtables -t nat --list (code=exited, status=0/SUCCESS)
Process: 63747 ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_retries2=8 (code=exited, status=0/SUCCESS)
Process: 63742 ExecStartPre=/bin/mount --make-shared /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 63734 ExecStartPre=/bin/bash -c if [ $(mount | grep "/var/lib/kubelet" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi (code=exited, status=0/SUCCESS)
Process: 63728 ExecStartPre=/bin/mkdir -p /var/lib/kubelet (code=exited, status=0/SUCCESS)
Process: 63722 ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh (code=exited, status=0/SUCCESS)
Main PID: 70046 (docker)
Tasks: 7
Memory: 4.0M
CPU: 945ms
CGroup: /system.slice/kubelet.service
└─70046 /usr/bin/docker run --net=host --pid=host --privileged --rm --volume=/dev:/dev --volume=/sys:/sys:ro --volume=/var/run:/var/run:rw --volume=/var/lib/docker/:/var/lib/docker:rw --volume=/var/lib/kubelet/:/var/lib/kubelet
4. Double check:
Go back on your machine and check again the value in Capacity.pods if it is equal to 200:
me@My-computer$ kubectl describe node node-hostname
Name: node-hostname
Roles: agent
Labels: agentpool=poolX
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=Standard_D8_v3
beta.kubernetes.io/os=linux
homelab=core
failure-domain.beta.kubernetes.io/region=westeurope
failure-domain.beta.kubernetes.io/zone=0
kubernetes.io/hostname=node-hostname
kubernetes.io/role=agent
node=kube
size=large
storageprofile=managed
storagetier=Standard_LRS
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Wed, 11 Jul 2018 11:57:04 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 11 Jul 2018 11:57:26 +0200 Wed, 11 Jul 2018 11:57:26 +0200 RouteCreated RouteController created a route
OutOfDisk False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 22 Mar 2019 16:39:55 +0100 Fri, 22 Mar 2019 15:04:50 +0100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.240.0.21
Hostname: k8s-homelab-worker-pool2-3
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 32
memory: 132017808Ki
pods: 200
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 32
memory: 131915408Ki
pods: 200
System Info:
Machine ID: aef45egerv0a378fe845rtg6a2acef156871e
System UUID: DCDF8472-a378-78fe8-tg6a2-tg6a2acef1568
Boot ID: 5egerv0a-a378-nr6e-78fe8-386a5cda4dfd
Kernel Version: 4.4.0-134-generic
OS Image: Debian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.7.7-3
me@My-computer$ kubectl describe node node-hostname
Name: node-hostname
Roles: agent
Labels: agentpool=poolX
beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=Standard_D8_v3
beta.kubernetes.io/os=linux
homelab=core
failure-domain.beta.kubernetes.io/region=westeurope
failure-domain.beta.kubernetes.io/zone=0
kubernetes.io/hostname=node-hostname
kubernetes.io/role=agent
node=kube
size=large
storageprofile=managed
storagetier=Standard_LRS
Annotations: node.alpha.kubernetes.io/ttl=0
volumes.kubernetes.io/controller-managed-attach-detach=true
Taints: <none>
CreationTimestamp: Wed, 11 Jul 2018 11:57:04 +0200
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
NetworkUnavailable False Wed, 11 Jul 2018 11:57:26 +0200 Wed, 11 Jul 2018 11:57:26 +0200 RouteCreated RouteController created a route
OutOfDisk False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasSufficientDisk kubelet has sufficient disk space available
MemoryPressure False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 22 Mar 2019 16:39:55 +0100 Thu, 21 Mar 2019 20:22:16 +0100 KubeletHasNoDiskPressure kubelet has no disk pressure
Ready True Fri, 22 Mar 2019 16:39:55 +0100 Fri, 22 Mar 2019 15:04:50 +0100 KubeletReady kubelet is posting ready status
Addresses:
InternalIP: 10.240.0.21
Hostname: k8s-homelab-worker-pool2-3
Capacity:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 32
memory: 132017808Ki
pods: 200
Allocatable:
alpha.kubernetes.io/nvidia-gpu: 0
cpu: 32
memory: 131915408Ki
pods: 200
System Info:
Machine ID: aef45egerv0a378fe845rtg6a2acef156871e
System UUID: DCDF8472-a378-78fe8-tg6a2-tg6a2acef1568
Boot ID: 5egerv0a-a378-nr6e-78fe8-386a5cda4dfd
Kernel Version: 4.4.0-134-generic
OS Image: Debian GNU/Linux 8 (jessie)
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://1.12.6
Kubelet Version: v1.7.7-3
Commentaires
Enregistrer un commentaire