Nextcloud self-hosted architecture

Design overview, diagrams, configs, scripts, specs and wins

Summary

My brother and I have been self-hosting Nextcloud for family and friends for several years. In this article, I describe the self-hosting infrastructure that supports our Nextcloud service.

We installed Nextcloud VM on an Ubuntu 20.04 virtual machine hosted on a TrueNAS server backed by 4 × 4 TB drives in RAID-Z1 with an off site TrueNAS server for data replication.

TrueNAS box (side panel removed) with home network equipment: Netgate router, 8-port gigabit switch, Unifi 5G Wi-Fi access point, cable modem.
TrueNAS box (side panel removed) with home network equipment: Netgate router, 8-port gigabit switch, Unifi 5G Wi-Fi access point, cable modem.

Nextcloud VM is the easiest and most reliable way to self-host Nextcloud that we found. It turns a freshly installed Ubuntu 20.04 VM into a Nextcloud appliance. It’s maintained by the Nextcloud project and the community.

https://github.com/nextcloud/vm

Main features

Here are my favourite features of Nextcloud VM.

  • Opinionated installation and configuration.
  • Performs well and is secure out of the box.
  • Self-contained, no external services required.
    • I have a reverse proxy with SSL offloading and ACME client for Let’s Encrypt certificates, but it is not required. The Nextcloud VM also comes with its own ACME client if needed.
  • Automatic updates of application, OS packages and appliance code.
  • Interactive setup script.
  • Sets up ZFS on the data volume with automatic snapshots.

Goals

  • 🖕 Data sovereignty
  • ☁️ File sharing and remote access
  • 📱 File synchronization across client devices
  • 🗄️ Archival
  • 🌎 Off site backups
  • 🔒 Data encryption at rest

Documentation

Since both TrueNAS and pfSense are based on FreeBSD, reading the FreeBSD handbook will help you in troubleshooting and getting the most of those systems. Nextcloud VM requires an Ubuntu 20.04 system and uses ZFS for the data filesystem.

Here are the documentation sites for the backup and mail components of my architecture:

Self-hosting infrastructure overview

Why appliances?

Automation with systems like Ansible or Kubernetes are necessary in enterprises, but also add layers of abstraction and complexity which are not necessary at a small scale. Furthermore, there is no such thing as a serverless deployment if you also have to maintain the underlying platform and infrastructure. Appliances will cover 95% of your use-cases and reduce the amount of coding and maintenance efforts.

pfSense, TrueNAS, ‘mailcow: dockerized’ and Nextcloud VM are free and open-source software that turn generic hardware and VMs into computer appliances. We use them at home for the most important or complex components of our architecture.

Here are benefits of using appliances based on free and open-source software and generic hardware:

  • Software is maintained by the open-source community and self-updates automatically.
  • Restoring an appliance from scratch is easy.
  • Hardware can be replaced by purchasing parts online.
  • The configuration can be exported and backed up through the management UI as a downloadable compressed XML file or some other convenient format.
  • The system can be reinstalled from scratch from a downloadable installation media.
  • Configuration can be restored or rolled back through the management UI by uploading a config backup.

When I make a configuration change on hardware appliances (pfSense and TrueNAS), I export the config and save it on my laptop in a directory synchronized across devices. This is like hitting Ctrl + S.

Appliances running on VMs are backed up using either TrueNAS rsync tasks in PULL mode or Borgmatic.

Domain registrar

NearlyFreeSpeech.net

I like nearlyfreespeech.net because they offer a barebones service at a very low profit margin. Prices are the lowest and it works well. I admire their ethics and their business model.

DNS

I use DigitalOcean for my personal projects. Their DNS service is free. The pfSense router has an integration with DigitalOcean API for dynamic DNS.

Mail

Nextcloud uses SMTP for user enrollment, self-serve password reset and file sharing by email among other things. I’ve been running a mailcow: dockerized VM appliance for mail self-hosting. To avoid the problem of public cloud hosting IP address blacklisting, I use Amazon SES for SMTP relay. It’s cheap and super simple to setup.

Reverse proxy with SSL offloading

My home router is a pfSense system on Netgate hardware. I install the HAProxy and Acme Certificates services from the Package Manager (packages haproxy and acme). SSL certificates are provided by Let’s Encrypt.

Monitoring

I run a FreeBSD jail I call Watchtower with components of the Grafana+Prometheus observability stack.

Storage architecture

Our production TrueNAS server runs in my home, while a backup TrueNAS server runs in my brother’s home. Our sites are connected with a VPN using OpenVPN. Conveniently, Nextcloud VM uses ZFS for the /mnt/ncdata filesystem. We configured the PULL replication task using the /mnt/ncdata dataset in the VM as the source, instead of the ZVOL on the TrueNAS host. This provides off-site backups while keeping the backup size as small as possible.

I have also been experimenting with block storage encryption with LUKS, using Borgmatic instead of ZFS replication for backups, providing encryption-at-rest of both data and backups. In this section, I describe the architecture with encryption-at-rest, although we do run a Nextcloud server without encrypted storage for our main family instance.

Why LUKS encryption?

https://wiki.archlinux.org/title/dm-crypt

  • Activating server-side encryption in Nextcloud increases file size by 35% (source)
  • Futhermore, the encryption key is in data/<user>/files_encryption so it doesn’t protect against physical access to the storage.
  • It is mostly useful for external storage (S3, NFS, etc.)
  • End-to-end encryption is not ready yet (as of April 2021).

Encryption key management

The LUKS encryption key is unlocked with a passphrase that must be manually entered on a terminal.

The passphrased is saved in a password manager. I use pass, the standard unix password manager.

In case the admin’s laptop is wiped, the encrypted filesystems can be accessed from a rescue environment. This environment is hosted in a TrueNAS VM which has ssh, git, pass and borg installed.

Backup strategy with borgmatic

  • To save space on the backup TrueNAS box, we prefer to replicate the filesystem from inside the VM instead of replicating the ZVOL on the host.
  • A borg FreeBSD jail provides filesystem-level encrypted backups.
  • Borgmatic helps with the borg client configuration on the VM.

Off site backup with TrueNAS

The borg backup dataset is replicated to an off site TrueNAS backup server.

The two sites are connected with an OpenVPN layer 4 tunnel.

Restoring files as an admin

The admin has the passphrases to the borg backup repo and ncdata LUKS container.

To access the ZFS snapshot on the ncdata dataset in the Nextcloud VM:

sudo ls /mnt/ncdata/.zfs/snapshot

To access the borg archives on the Nextcloud VM or the admin’s workstation, or the admin rescue VM:

sudo borg mount user@borghost:repo

VM specifications

  • 2 vCPUs
  • 4 GB memory
  • 50 GB root ZVOL
  • x TB data ZVOL

Preparation

  1. Create two ZVOLS in TrueNAS.
    1. /mnt/pool-01/virtual-machines/nextcloud/root (50 GB)
    2. /mnt/pool-01/virtual-machines/nextcloud/ncdata (x TB)
  2. Create the VM in TrueNAS and attach the ZVOLs.
  3. Attach the Ubuntu 20.04 LTS installation media, boot the VM, and install the OS (1 hour).
  4. Follow the Nextcloud VM installation instructions and interactive script (1 hour).
  5. Log into Nextcloud VM over SSH.
  6. Re-create the ncdata ZFS dataset in a LUKS container.
    1. Move the content of /mnt/ncdata to /root/ncdata.
    2. Run cryptsetup luksFormat on /dev/sdb.
    3. Open the container with cryptsetup open.
    4. Create a ZPOOL on /dev/mapper/ncdata.
    5. Mount the root dataset on /mnt/ncdata.
    6. Install the open-ncdata script in the home directory of a normal user.
    7. Install Borgmatic, the Borgmatic configuration, the borgmatic.service and borgmatic.timer systemd unit files.

Dead man switch

At the end of the nextcloud_update.sh cron job, the VM is restarted to apply updates. The VM starts, but the ncdata filesystem is encrypted and the LUKS container can only be decrypted with a passphrase.

This is a feature, not a bug. 😄

It prevents the data from being accessed without knowledge of the passphrase. It also ensure an auto-locking of the storage when the Nextcloud application is inactive and not needed.

A shell script semi-automates the actions required from the system administrator or owner.

On my workstation, I have open-ncdata in my PATH:

#!/bin/bash

. colors.sh

function log() {
echo -e "${txtpur}${@}${txtrst}"
}

log Running open-ncdata

log Opening your password manager and copying the nextcloud-vm password to the clipboard.
pass home/nextcloud-vm/alex -c


log Entering nextcloud-alex.
ssh -t nextcloud-alex ./open-ncdata
log You left nextcloud-alex.

On the Nextcloud VM, in my user’s home directory, there is another open-ncdata:

#!/bin/bash

##### BEGIN colors.sh

erasePreviousLine () {
            eraseline='\r\033[K'
                echo -ne "$eraseline"
        }

# Special characters
xmark="\xE2\x9C\x97"
chkmrk="\xE2\x9C\x93"

# Colors
endcolor='\033[0m'
white='\033[1;37m'
gray='\033[0;37m'
red='\033[1;31m'
green='\033[1;32m'

txtblk='\033[0;30m' # Black - Regular
txtred='\033[0;31m' # Red
txtgrn='\033[0;32m' # Green
txtylw='\033[0;33m' # Yellow
txtblu='\033[0;34m' # Blue
txtpur='\033[0;35m' # Purple
txtcyn='\033[0;36m' # Cyan
txtwht='\033[0;37m' # White
bldblk='\033[1;30m' # Black - Bold
bldred='\033[1;31m' # Red
bldgrn='\033[1;32m' # Green
bldylw='\033[1;33m' # Yellow
bldblu='\033[1;34m' # Blue
bldpur='\033[1;35m' # Purple
bldcyn='\033[1;36m' # Cyan
bldwht='\033[1;37m' # White
unkblk='\033[4;30m' # Black - Underline
undred='\033[4;31m' # Red
undgrn='\033[4;32m' # Green
undylw='\033[4;33m' # Yellow
undblu='\033[4;34m' # Blue
undpur='\033[4;35m' # Purple
undcyn='\033[4;36m' # Cyan
undwht='\033[4;37m' # White
bakblk='\033[40m'   # Black - Background
bakred='\033[41m'   # Red
bakgrn='\033[42m'   # Green
bakylw='\033[43m'   # Yellow
bakblu='\033[44m'   # Blue
bakpur='\033[45m'   # Purple
bakcyn='\033[46m'   # Cyan
bakwht='\033[47m'   # White
txtrst='\033[0m'    # Text Reset

##### END colors.sh

function log () {
echo -e "\n${bakblu}||||||${txtrst} ${@}\n"
}

function action () {
echo -e "${bakylw}||||||${txtrst}${@} ←\n"
}

function test_result_ok () {
echo -e "Test ${txtgrn}||| OK |||${txtrst}\n"
}

function test_result_failed () {
echo -e "Test ${txtred}||| FAILED |||${txtrst}\n"
}

function test_nextcloud() {
curl -k -s -f https://nextcloud-vm.example.com/status.php \
&& echo
}

log "Testing Nextcloud staus..."

if test_nextcloud; then
    test_result_ok
    log Exiting.
    exit
else
    test_result_failed
    action "Press Ctrl+V to paste sudo password."
fi


log "Decrypting data volume..."
action "Type the LUKS container key passphrase for /dev/sdb."

[ ! -e /dev/mapper/ncdata ] && sudo cryptsetup open /dev/sdb ncdata

log "Mounting ZFS dataset on /mnt/ncdata..."

sudo bash -c '[ ! -e /mnt/ncdata/.ocdata ]' \
&& sudo zpool import ncdata \
&& zfs set mountpoint=/mnt/ncdata ncdata

log "Testing Nextcloud staus..."

if test_nextcloud; then
    test_result_ok
else
    test_result_failed
fi

Monitoring

Metrics are collected in a Prometheus monitoring system. Logs are aggregated with Loki and Promtail. Logs and metrics are visualized in Grafana.

All the components of the monitoring stack run in a FreeBSD Jail. node_exporter is installed on the Nextcloud VM. I haven’t done the Promtail component yet, but it will be installed on the VM.

sudo apt install prometheus-node-exporter

On the monitoring FreeBSD Jail, the prometheus.yml config file contains:

alerting:
  alertmanagers:
  - static_configs:
    - targets:
      - alertmanager:9093

rule_files:
  - /usr/local/etc/prometheus_alerts.yml

scrape_configs:
  - job_name: websites
    metrics_path: /probe
    params:
      module: [http_2xx]
    static_configs:
      - targets:
          - https://nextcloud.example.com/status.php
    relabel_configs:
      - source_labels: [__address__]
        target_label: __param_target
      - source_labels: [__param_module]
        target_label: module
      - source_labels: [__param_target]
        target_label: instance
      - target_label: __address__
        replacement: blackbox_exporter.infra.example.com:9115

prometheus_alerts.yml:

  - name: websites
    rules:
      - alert: HTTP code not 200
        expr: probe_http_status_code{module="http_2xx"} != 200
        for: 30m
        annotations:
          dashboard_url: https://grafana.example.com/d/000000021/blackbox-exporter-probes
          blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/
      - alert: Blackbox probing failed
        expr: probe_success != 1
        for: 30m
        annotations:
          dashboard_url: https://grafana.deverteuil.net/d/000000021/blackbox-exporter-probes
          blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/
      - alert: SSL certificate is overdue for renewal
        expr: (probe_ssl_earliest_cert_expiry - time()) / (24 * 3600) < 14
        annotations:
          summary: This SSL certificate will expire within 14 days.
          dashboard_url: https://grafana.deverteuil.net/d/000000021/blackbox-exporter-probes
          blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/

Alerts and manual intervention

At the end of the weekly nextcloud_update.sh cron job, the VM is restarted automatically. Because the ncdata LUKS container is not added to crypttab, the VM boots up but the volume remains encrypted.

  • Nextcloud will return 500.
  • Blackbox HTTP checks fail because /status.php returns 500 status code.
  • Alert is sent to the owner as a reminder to run open-ncdata.

When the open-ncata script is run, the Nextcloud service resumes immediately and an Alert RESOLVED notification is sent.

Takeaways

Your self-hosted application can only be as solid as your self-hosting infrastructure.

A solid infrastructure should provide:

  • A domain name
  • Public DNS
  • DNS resololver
  • Dynamic DNS client
  • Virtual machines
  • Jails or containers
  • Text-based documentation with static website output
  • Reliable storage
  • DHCP
  • Reverse proxy
  • SSL offloading
  • ACME client
  • Mail hosting
  • SMTP relay
  • Observability (logs, metrics)
  • Monitoring (alerts)
  • Secure backups (off-site and encrypted)

Use FOSS appliances when possible rather than building your own deployment-as-code for every piece of infrastructure and applications. TrueNAS and pfSense are great open-source infrastructure appliances you can install on generic hardware. Nextcloud VM is a simple solution for deploying and configuring Nextcloud as an appliance.

Partner with a techie friend or family member living at a different address. Share the setup and maintenance workload, and use each others hardware for off-site backups. Also share learning experiences! ✨

Next steps

  • Write an article on how I monitor TrueNAS with Prometheus, Promtail, rsyslog and graphite_exporter.
  • Continue experimenting and improving the encryption-at-rest feature.
  • Teach my girlfriend how to use Nextcloud and how to organize pictures 😄.

Discussion

I posted about this article on /r/selfhosted:
https://www.reddit.com/r/selfhosted/comments/nkzms1/nextcloud_selfhosted_architecture/

Alexandre de Verteuil
Alexandre de Verteuil
Senior Solutions Architect

I teach people how to see the matrix metrics.
Monkeys and sunsets make me happy.

Related