My 4-node k3s cluster (where this blog is hosted too) kept dying every now and then. Looking at kubectl describe nodes
it quickly became evident that this was caused by the nodes running out of disk space. Once a node gets tainted with HasDiskPressure
, pods might get evicted and the kubelet will be using (quite a lot of) CPU trying to free disk space by garbage collecting container images and freeing ephemeral storage.
My setup by default uses local storage (the local-path
provider) where volumes are actually local folders on the node. This means that pods that use persistence are stuck with the same node forever and can’t just move around. This makes eviction a problem, since they have nowhere else to go. It also means that disk usage is actually disk usage on the node, and not on some block volume over the network.