Monitoring TrueNAS With Prometheus and Loki

Technical howto article

2022-10-16 7 min read Informatique

Summary

I use Prometheus for monitoring in my self-hosting infrastructure, but my TrueNAS box exposes metrics in the Graphite protocol. In this post, I describe how I use graphite_exporter to collect metrics from TrueNAS with Prometheus, and Rsyslog and Promtail to send logs to Loki.

Table of Contents

Documentation

Graphite Exporter README file.
Reporting settings in TrueNAS CORE documentation, for configuring Graphite metrics.
Advanced system settings in TrueNAS CORE documentation, for configuring syslog.

Choosing the monitoring strategy

TrueNAS doesn’t expose metrics in the Prometheus format. I explored a few methods for collecting metrics with Prometheus.

❌ Installing packages on the base OS

TrueNAS turns generic hardware into a network storage and hosting appliance. Knowing that it’s based on FreeBSD, it would be simple to install Prometheus exporters on the system, but this is not recommended because installing local packages can interfere with the appliance code. The package managers pkg and ports aren’t even enabled by default.

❌ Plugins

TrueNAS has a plugin system, but I don’t feel it has enough maintainers to trust a 3rd party plugin and I don’t want to write one myself.

❌ Using the API and writing an exporter

One could write a truenas_exporter that would query the API. I checked the API documentation, it didn’t look like an easy API to work with.

✅ Using natively available legacy telemetry and logging application streams

I figured it would be easier to use the telemetry and logging protocols available out-of-the-box in TrueNAS, and to convert them to Prometheus and Loki formats using a service outside of TrueNAS.

metrics: using graphite_exporter to convert metrics in Prometheus format. Continue reading below.
logs: using rsyslog and the syslog receiver in Promtail. Read Using Rsyslog and Promtail to relay syslog messages to Loki.

Collecting logs from TrueNAS into Loki

I use rsyslog and Promtail to collect logs from TrueNAS and send them to Loki. See Using Rsyslog and Promtail to relay syslog messages to Loki for more details about setting up this logging pipeline.

In the TrueNAS configuration interface, under System > Advanced, you can set the hostname for your syslog server.

Screenshot of the advanced system settings on my TrueNAS server.

Collecting metrics from TrueNAS into Prometheus

Find the documentation in the README file in the GitHub repository:
https://github.com/prometheus/graphite_exporter

Deploy Graphite Exporter

I deployed graphite_exporter on my monitoring VM, along with Prometheus and Grafana. It runs as a single process systemd service.

graphite_exporter has two listening ports:

Carbon protocol listener on UDP port 9109. Must be changed to the Graphite default port 2300 to work with TrueNAS.
HTTP Metrics endpoint on port 9108. This is the endpoint that Prometheus will scrape.

Here’s my systemd unit file:

[Unit]
Description=graphite_exporter
Wants=network-online.target
After=network-online.target

[Service]
User=graphite_exporter
Group=graphite_exporter
Type=simple
ExecStart=/usr/local/bin/graphite_exporter \
    --graphite.mapping-config=/etc/graphite_mappings.yaml \
    --graphite.listen-address=:2003

[Install]
WantedBy=multi-user.target

Mapping config

graphite_exporter requires a Graphite→Prometheus mapping file. Mappings are defined in YAML format. Here is my mapping config for converting TrueNAS metrics from Graphite to Prometheus format.

In my network, FQDNs have 4 components. Graphite uses dots as a separate in its metrics namespace, so dots in hostnames are converted to underscores. In my mapping config, I use the ([^_]*)_([^_]*)_([^_]*)_([^_]*) regex and the ‘${1}.${2}.${3}.${4}’ replacement string to convert underscores back to dots in the instance label. You may need to adjust your regex and shift the back references in your replacement strings according to your network name services settings.

---
# All FreeNAS metrics start with server.<server_name>
# https://www.truenas.com/community/threads/mapping-of-freenas-data-sent-to-graphite_exporter-part-of-prometheus.80948/
# https://blog.bradbeattie.com/freenas-to-prometheus/

# Validate configuration with:
# /path/to/graphite_exporter --check-config --graphite.mapping-config=/path/to/this_file.yml
# graphite_exporter --check-config --graphite.mapping-config=files/configs/present/graphite_mapping-2.yml

# To develop mappings locally:
# graphite_exporter --log.level=debug --graphite.mapping-config=
# nc localhost 9109 < graphite-sample.txt
# curl localhost:9108/metrics -s | grep ^freenas | less

# Mapping syntax is based on statsd_exporter.
# https://github.com/prometheus/statsd_exporter#regular-expression-matching

mappings:

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.disktemp-(.*)\.temperature'
    match_type: regex
    name: "freenas_disk_temperature"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      disk: '${5}'

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.cputemp-(.*)\.temperature'
    match_type: regex
    name: "freenas_cpu_temperature"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      cpu: '${5}'

  # When "Report CPU usage in percent" is NOT selected. (default)
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.aggregation_cpu_(.*)\.percent-(.*)'
    match_type: regex
    name: "freenas_cpu_percent_${5}"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      mode: '${6}'

  # When "Report CPU usage in percent" is selected.
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.aggregation_cpu_(.*)\.cpu-(.*)'
    match_type: regex
    name: "freenas_cpu_usage_${5}"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      mode: '${6}'

  # When "Report CPU usage in percent" is NOT selected. (default)
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.cpu-(.*)\.cpu-(.*)'
    match_type: regex
    name: "freenas_cpu_usage"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      cpu: '${5}'
      mode: '${6}'

  # When "Report CPU usage in percent" is selected.
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.cpu-(.*)\.percent-(.*)'
    match_type: regex
    name: "freenas_cpu_percent"
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      cpu: '${5}'
      mode: '${6}'

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.df-(.*)\.df_complex-(.*)'
    match_type: regex
    name: 'freenas_df_${6}'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      filesystem: '${5}'

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.disk-(.*)\.disk_(.*)\.(.*)'
    match_type: regex
    name: 'freenas_disk_${6}_${7}'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      device: '${5}'

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.interface-(.*)\.if_(.*)\.(.*)'
    match_type: regex
    name: 'freenas_interface_${7}_${6}'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
      interface: '${5}'

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.load\.load\.longterm'
    match_type: regex
    name: 'freenas_load_15'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.load\.load\.midterm'
    match_type: regex
    name: 'freenas_load_5'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite
  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.load\.load\.shortterm'
    match_type: regex
    name: 'freenas_load_1'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.memory\.memory-(.*)'
    match_type: regex
    name: 'freenas_memory_${5}'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.swap\.swap-(.*)'
    match_type: regex
    name: 'freenas_swap_${5}'
    labels:
      instance: '${1}.${2}.${3}.${4}'
      job: freenas-graphite

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.uptime\.uptime'
    match_type: regex
    name: freenas_uptime
    labels:
      job: freenas-graphite
      instance: "${1}.${2}.${3}.${4}"

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.processes\.ps_state-(.*)'
    match_type: regex
    name: freenas_processes
    labels:
      job: freenas-graphite
      instance: "${1}.${2}.${3}.${4}"
      state: "${5}"

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.([^.]*)\.([^.]*)$'
    match_type: regex
    name: freenas_graphite_${5}
    labels:
      job: freenas-graphite-raw
      instance: "${1}.${2}.${3}.${4}"
      item: "${6}"

  - match: 'servers\.([^_]*)_([^_]*)_([^_]*)_([^_]*)\.(.*)'
    match_type: regex
    name: freenas_graphite_raw
    labels:
      job: freenas-graphite-raw
      instance: "${1}.${2}.${3}.${4}"
      graphite_metric: "${5}"

  - match: 'sensor.*.*'
    name: sensor_${5}
    labels:
      job: esp32-sensors
      instance: ${1}.${2}.${3}.${4}

Configure the Graphite target on TrueNAS

Under System > Reporting, set the Graphite server’s IP address. I also like to have CPU usage reported as percent instead of units of kernel time (a.k.a. jiffies).

See the reference documentation here:
https://www.truenas.com/docs/scale/scaletutorials/reporting/configreportsscale/

You can’t set the port or protocol for the Graphite target, so you must set graphite_exporter to listen on the default Graphite UDP port 2003.

Screenshot of the reporting configuration on my TrueNAS server.

Scrape Graphite Exporter metrics in Prometheus

This is my scrape config in Prometheus.

scrape_configs:
  - job_name: graphite
    static_configs:
      - targets:
          - <url of my graphite_exporter>:9108
    honor_labels: true

I set honor_labels to true so that Prometheus doesn’t overwrite the job and instance labels that I defined in the graphite_exporter metrics mapping config. See the scrape_config configuration reference for details:
https://prometheus.io/docs/prometheus/latest/configuration/configuration/#scrape_config

Example Prometheus queries

Here are a few examples of PromQL expressions I use in Grafana dashboards based on the metrics mapping configuration above.

In these examples, $instance is a Grafana dashboard variable and $__interval is a global variable. They are not part of PromQL syntax and will be substituted by Grafana when executing the queries.

CPU usage:

avg without (cpu) (
    sum without (mode) (
        max_over_time(freenas_cpu_percent{instance=~"$instance",mode!="idle"}[$__interval])
    )
)

Load average:

freenas_load_1
freenas_load_5
freenas_load_15

Filesystems usage:

sum by (instance) (freenas_df_used{instance=~"$instance"})

Top 10 filesystems:

topk(10,freenas_df_used{instance=~"$instance"})

Conclusion

Graphite was revolutionary in 2010 and still has great support in Grafana, but Prometheus is much more efficient, modern, scalable, and easy to use.

Some legacy systems, network and storage appliances in particular, aren’t equipped to work natively with modern monitoring and logging systems.

TrueNAS is not instrumented with Prometheus metrics, and there is currently no exporter which could read metrics from the API and expose them in Prometheus format. However, TrueNAS does generate Graphite metrics, and there is a Graphite Exporter which can translate Graphite metrics for Prometheus.

For other similar use-cases, here are a few methods to get metrics from legacy systems into Prometheus:

Using an exporter if one exists.
Writing an exporter.
Using SNMP Exporter.
Using the InfluxDB Exporter.
Using the statsd exporter.
Using the Prometheus output data format in Telegraf if it’s easier to use Telegraf (this is how I do it for my pfSense router).

loki grafana monitoring prometheus freenas