kubectl
has the feature to select objects by filtering on labels using the -l
flag. Labels are key-value pairs attached to objects as metadata and they don’t have to be unique. I’ve most often seen them used to identify what project or app an individual resource belongs to. Helm uses labels to mark resources with the app, chart and revision they belong to.
But wait, if they’re not unique and there is a way to select multiple values with set operators, how does that work? The database backing Kubernetes by default, etcd is a key-value store. While it can natively select multiple records by prefix matching, it’d be hard to imagine labels working like that. There are many of them and the selectors are complex.
So I dove into Kubernetes’s source code to figure out how it works.
My first idea was an implementation that felt pretty naive: just select all the resources and then filter them “client-side” in the Kubernetes control plane. And as it turns out, that is actually how it works (for most of it).
Down the rabbit hole
Of course I started by searching the Kubernetes repository for “etcd”, but most of the hits were pretty much irrelevant. I figured that Kubernetes wouldn’t hardwire its models to etcd concepts, but have some abstraction inbetween so they could use “any” datastore to back it. (In fact k3s uses SQLite by default.)
While looking at options.etcd didn’t reveal much about how the abstractions work, I could find keywords to refine my search further. “Storage” seems to be the actual implementation of communicating with a datastore service and “Store” one more abstraction around that. Then storagebackendfactory.etcd3 felt like I was getting closer to what I was looking for, but I couldn’t find the connection there.
So I went back filtering search results. Along the way I noticed that resources (like Role for example) have their own REST types defined. That reference to genericregistry.Store
was what got me on the right track.
genericregistry.Store’s source code was way more revealing, but it was, as its name suggest, generic. It’s already a layer of abstraction above the database-specific implementation that I was interested in. The implementation is “dependency injected” into the functions with the e *Store bit.
Then literally the last hit in the search had the paths of some “vendor” files, meaning I suppose they’re vendor-specific. And they were.
List and GetToList
What I was looking for was there in the etcd3.store code. From the generic implementation I knew that it used either the List or GetToList of the database-specific storage, using the latter if it MatchesSingle.
Both of these functions use the appendToList private function to accumulate results matching the provided selector. That’s the answer I was looking for. Kubernetes does select all the keys and then filters them in the application code. That’s one answer.
What’s the difference?
What still wasn’t clear was the difference between List and GetToList. If Go would hot-reload I’d just spin up a cluster, add some debug logging in both of them and see how it’s used, but sadly it’s a compiled language and I’m really not interested enough to compile a Kubernetes cluster for myself from zero.
The interface declaring the functions had some documentation comments which made the distinction clear: List is used to get all records in a “directory” while GetToList gets records from a single key. That makes the difference clear but raised another question.
Multiple records at one key?
Can one key resolve into multiple records? How does that work? It may be 1am, but curiosity trumps sleep. The source code comment on the interface makes it clear that one json will be unmarshaled into a list. So I must assume that some Kubernetes resources are always handled together.
I tried peeking the raw data from my own cluster, but it’s some binary format and not plain json so reading it without kubectl
‘s help isn’t comfortable.
My top candidate was the resource Endpoints, because it’s always plural. The documentation says so too, so I figure that’s one thing I got right at first try in this process. I wonder if there are any others?
At this point my curiosity runs out and sleep wins.
Lol, I didn’t expect to find a blog that answered this k8s question. Thanks for taking the time to dig through the codebase and provide direct links! Saved me the work of doing the same dive.
Man I’d love to see apiserver and apimachinery extracted into standalone reusable components for other projects. 🙂