How to increase Pods limit per worker node in Kubernetes

Kubelet is close to pod limit

Issue:

The Alert Manager reported that our Kubelets running on the production nodes were running too many pods, close to the default limit of 110.

Solution:

First We need to take a look at Grafana dashboard to see if the nodes are able to receive more pods in terms of infrastructure resources (CPU/RAM, ....), then calculate the average of this resource utilization in order to determine the new value of pods limit on each node.

 

We suppose that the chosen value for the pods limit on nodes is 200, you can change it following these steps :

    1.    Connect:

First connect to the node (the worker with the issue):                                                                   
$ ssh -i /path/to/your/ssh-key/id_rsa [email protected]
[email protected]:~$ sudo -i 

    2.    Edit: 


Change the value KUBELET_MAX_PODS in the file /etc/default/kubelet from 110 to 200 ( use your favorite file editor to change the value):
[email protected]:~# cat /etc/default/kubelet
KUBELET_CLUSTER_DNS=10.0.0.10
KUBELET_API_SERVERS=https://10.240.255.15:443
KUBELET_IMAGE=yassinemaachi/hyperkube-amd64:v1.7.7
KUBELET_NETWORK_PLUGIN=kubenet
KUBELET_MAX_PODS=200
DOCKER_OPTS=
CUSTOM_CMD=/bin/true
KUBELET_REGISTER_SCHEDULABLE=true
KUBELET_NODE_LABELS=kubernetes.io/role=agent,agentpool=kubepool1,node=kube
KUBELET_POD_INFRA_CONTAINER_IMAGE=gcrio.azureedge.net/google_containers/pause-amd64:3.0
KUBELET_NODE_STATUS_UPDATE_FREQUENCY=10s
KUBE_CTRL_MGR_NODE_MONITOR_GRACE_PERIOD=40s
KUBE_CTRL_MGR_POD_EVICTION_TIMEOUT=5m0s
KUBE_CTRL_MGR_ROUTE_RECONCILIATION_PERIOD=10s
KUBELET_IMAGE_GC_HIGH_THRESHOLD=85
KUBELET_IMAGE_GC_LOW_THRESHOLD=80

KUBELET_FEATURE_GATES=--feature-gates=Accelerators=true

    3.    Restart & check:


Restart the kubelet service in the node and check if it is properly started:
[email protected]:~# systemctl restart kubelet.service
[email protected]:~# systemctl status kubelet.service
● kubelet.service - Kubelet
   Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: enabled)
   Active: active (running) since Fri 2019-03-22 13:52:42 UTC; 5min ago
  Process: 63755 ExecStartPre=/sbin/iptables -t nat --list (code=exited, status=0/SUCCESS)
  Process: 63750 ExecStartPre=/sbin/ebtables -t nat --list (code=exited, status=0/SUCCESS)
  Process: 63747 ExecStartPre=/sbin/sysctl -w net.ipv4.tcp_retries2=8 (code=exited, status=0/SUCCESS)
  Process: 63742 ExecStartPre=/bin/mount --make-shared /var/lib/kubelet (code=exited, status=0/SUCCESS)
  Process: 63734 ExecStartPre=/bin/bash -c if [ $(mount | grep "/var/lib/kubelet" | wc -l) -le 0 ] ; then /bin/mount --bind /var/lib/kubelet /var/lib/kubelet ; fi (code=exited, status=0/SUCCESS)
  Process: 63728 ExecStartPre=/bin/mkdir -p /var/lib/kubelet (code=exited, status=0/SUCCESS)
  Process: 63722 ExecStartPre=/bin/bash /opt/azure/containers/kubelet.sh (code=exited, status=0/SUCCESS)
 Main PID: 70046 (docker)
    Tasks: 7
   Memory: 4.0M
      CPU: 945ms
   CGroup: /system.slice/kubelet.service
           └─70046 /usr/bin/docker run --net=host --pid=host --privileged --rm --volume=/dev:/dev --volume=/sys:/sys:ro --volume=/var/run:/var/run:rw --volume=/var/lib/docker/:/var/lib/docker:rw --volume=/var/lib/kubelet/:/var/lib/kubelet

    4.    Double check:


 Go back on your machine and check again the value in Capacity.pods if it is equal to 200:
[email protected]$ kubectl describe node node-hostname
Name:               node-hostname
Roles:              agent
Labels:             agentpool=poolX
                    beta.kubernetes.io/arch=amd64
                    beta.kubernetes.io/instance-type=Standard_D8_v3
                    beta.kubernetes.io/os=linux
                    homelab=core
                    failure-domain.beta.kubernetes.io/region=westeurope
                    failure-domain.beta.kubernetes.io/zone=0
                    kubernetes.io/hostname=node-hostname
                    kubernetes.io/role=agent
                    node=kube
                    size=large
                    storageprofile=managed
                    storagetier=Standard_LRS
Annotations:        node.alpha.kubernetes.io/ttl=0
                    volumes.kubernetes.io/controller-managed-attach-detach=true
Taints:             <none>
CreationTimestamp:  Wed, 11 Jul 2018 11:57:04 +0200
Conditions:
  Type                 Status  LastHeartbeatTime                 LastTransitionTime                Reason                       Message
  ----                 ------  -----------------                 ------------------                ------                       -------
  NetworkUnavailable   False   Wed, 11 Jul 2018 11:57:26 +0200   Wed, 11 Jul 2018 11:57:26 +0200   RouteCreated                 RouteController created a route
  OutOfDisk            False   Fri, 22 Mar 2019 16:39:55 +0100   Thu, 21 Mar 2019 20:22:16 +0100   KubeletHasSufficientDisk     kubelet has sufficient disk space available
  MemoryPressure       False   Fri, 22 Mar 2019 16:39:55 +0100   Thu, 21 Mar 2019 20:22:16 +0100   KubeletHasSufficientMemory   kubelet has sufficient memory available
  DiskPressure         False   Fri, 22 Mar 2019 16:39:55 +0100   Thu, 21 Mar 2019 20:22:16 +0100   KubeletHasNoDiskPressure     kubelet has no disk pressure
  Ready                True    Fri, 22 Mar 2019 16:39:55 +0100   Fri, 22 Mar 2019 15:04:50 +0100   KubeletReady                 kubelet is posting ready status
Addresses:
  InternalIP:  10.240.0.21
  Hostname:    k8s-homelab-worker-pool2-3
Capacity:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             32
 memory:                          132017808Ki
 pods:                            200
Allocatable:
 alpha.kubernetes.io/nvidia-gpu:  0
 cpu:                             32
 memory:                          131915408Ki
 pods:                            200
System Info:
 Machine ID:                 aef45egerv0a378fe845rtg6a2acef156871e
 System UUID:                DCDF8472-a378-78fe8-tg6a2-tg6a2acef1568
 Boot ID:                    5egerv0a-a378-nr6e-78fe8-386a5cda4dfd
 Kernel Version:             4.4.0-134-generic
 OS Image:                   Debian GNU/Linux 8 (jessie)
 Operating System:           linux
 Architecture:               amd64
 Container Runtime Version:  docker://1.12.6
 Kubelet Version:            v1.7.7-3

    5.    Repeat steps from 1 to 4 on all the nodes with this issue.



Commentaires

Posts les plus consultés de ce blog

Knative vs OpenFaaS: What are the differences?

Remove .DS_Store from git repository