cows on grassy field under blue cloudy sky

Last year I had a brief affair with Longhorn. It’s a tool that abstracts the interface to interact with volumes (in my case in Kubernetes) from where the underlying data actually lives. In my case, my cluster consists of three small nodes, and back then most of the data lived in local-path provisioned volumes. Using local-path means that the data is physically on the host machine instead of some virtually mounted filesystem. This also means that once an application has a PVC, it can’t be assigned to any other node (it results in a “conflict”).

So I tried out Longhorn, and at first it was glorious. I could adjust pod affinities so that there won’t be multiple more CPU-heavy pods on one node (which in the case of $5 linodes can easily render a node completely unusable). All was shiny and nice until weird things started happening.

Namely, the node that Prometheus resides on was seeing insane load. Prometheus in itself is not exactly light, but it wasn’t that bad before. Stranger even, the pod CPU usage graphs in Grafana showed nothing unusual. This usually means that the node is running out of memory (which isn’t difficult) and is busy madly swapping memory to disk and back. But this didn’t seem to be the case as the swap process was humming by as usual.

The problem was something with Longhorn. I didn’t dig too deep into it (since it could render the node unresponsive in the matter of minutes I wasn’t keen to experiment), but since it only happened on the node where Prometheus was, that when persistence is enabled in Prometheus it writes to disk “too often” for Longhorn’s liking, and because my node is small and weak, Longhorn couldn’t keep up with the load, eventually overwhelming the node.

In the end I “resolved” this by giving up on Longhorn (and while I was it I also restricted Prometheus persistence a lot more). Other than Prometheus, only two pieces use persistent volumes: the database and this blog’s storage, both of which were already on volumes powered by the linode CSI driver, so nothing much was lost. It was a fun ride though.