Earlier I wrote about how I set up Beats – Elasticsearch – Grafana to visualize the various metrics (and logs, hate me) from Kitsune‘s dev server. There were a few tricky spots that didn’t work at first and took a while to figure out (or at least get working).

The silent network

First was that on the system metrics dashboard I set up, the network traffic graph would flatline. It didn’t show anything. Empty. Except the records were all there: if I threw a raw query at Elasticsearch, I could see all the system.network.in.bytes (or out.bytes) values.

The root of the problem: just using those metrics as-is won’t do, since it’s a cumulative sum of all network traffic. You know if you look at your system monitor (or task manager or however your OS calls it) on your desktop, you’ll see something like this:

Network history in Ubuntu’s System Monitor

The metrics in question contain that ever-growing total. What I wanted to see is a graph similar to the above: the actual traffic at any given point in time. That’s the change of the sum between two points. That’s a derivative.

Luckily enough, Grafana supports derivative aggregation out of the box, so no need to set up Elasticsearch pipelines and all that. However, getting it to work as desired is a bit quirky.

Query: beat.hostname:$Host

Metric (invisible): Average of system.network.in.bytes
Metric (visible): Derivative of Average of system.network.in.bytes, with options set as Unit: second

Group by: Date Historgram of @timestamp, Interval 10s
Grafana settings for a network traffic derivative

The biggest catch is that leaving the Date Histogram’s Interval on “auto” simply didn’t work. As soon as I specified an exact Interval value, the graph popped to life and filled up in a flash. 10 seconds is the shortest available, which is why I’m using it.

The silent disk i/o

The other problematic bit was another i/o metrics, disk usage. This turned out to be a completely different problem from the network above. Unlike network traffic, disk i/o supposedly already has derivative metrics provided in iostat fields like system.diskio.iostat.read.per_sec.bytes.

The problem was that the relevant read/write fields were empty. Querying Elasticsearch directly I could see that the fields were simply not filled up, which indicated that this was a problem with Metricbeat itself, not some Grafana setting.

It was weird because I’d swear it worked when I set up the same system locally on Docker, so there was no reason it wouldn’t work with the same setup on an actual server. Except it didn’t. I double checked and Metricbeat had the System module enabled: just check if the modules.d/system.yml file is called that and doesn’t have .disabled attached to it. If it is disabled, metricbeat modules enable system fixes that.

Terminal screenshot from a container's /etc/metricbeat folder
Metricbeat modules: only system is enabled

The empty metrics problem I faced was actually caused by the contents of said system.yml: the diskio line was commented out, disabling that metricset.

system.yml with diskio commented out

Un-commenting that line (and then reloading Metricbeat) was all it took. But it made me wonder why it might be turned off. The sample config file has it turned off too, but there are no details on why. I noticed no change in CPU usage so it shouldn’t be a resource problem either. Mystery.