Nextcloud self-hosted architecture
Design overview, diagrams, configs, scripts, specs and wins
Summary
My brother and I have been self-hosting Nextcloud for family and friends for several years. In this article, I describe the self-hosting infrastructure that supports our Nextcloud service.
We installed Nextcloud VM on an Ubuntu 20.04 virtual machine hosted on a TrueNAS server backed by 4 × 4 TB drives in RAID-Z1 with an off site TrueNAS server for data replication.
Nextcloud VM is the easiest and most reliable way to self-host Nextcloud that we found. It turns a freshly installed Ubuntu 20.04 VM into a Nextcloud appliance. It’s maintained by the Nextcloud project and the community.
https://github.com/nextcloud/vm
Main features
Here are my favourite features of Nextcloud VM.
- Opinionated installation and configuration.
- Performs well and is secure out of the box.
- Self-contained, no external services required.
- I have a reverse proxy with SSL offloading and ACME client for Let’s Encrypt certificates, but it is not required. The Nextcloud VM also comes with its own ACME client if needed.
- Automatic updates of application, OS packages and appliance code.
- Interactive setup script.
- Sets up ZFS on the data volume with automatic snapshots.
Goals
- 🖕 Data sovereignty
- ☁️ File sharing and remote access
- 📱 File synchronization across client devices
- 🗄️ Archival
- 🌎 Off site backups
- 🔒 Data encryption at rest
Documentation
Since both TrueNAS and pfSense are based on FreeBSD, reading the FreeBSD handbook will help you in troubleshooting and getting the most of those systems. Nextcloud VM requires an Ubuntu 20.04 system and uses ZFS for the data filesystem.
- Nextcloud VM
- Nextcloud Server Administration Guide
- Nextcloud User Manual
- Ubuntu 20.04 LTS Server Guide
- FreeBSD handbook
Here are the documentation sites for the backup and mail components of my architecture:
Self-hosting infrastructure overview
Why appliances?
Automation with systems like Ansible or Kubernetes are necessary in enterprises, but also add layers of abstraction and complexity which are not necessary at a small scale. Furthermore, there is no such thing as a serverless deployment if you also have to maintain the underlying platform and infrastructure. Appliances will cover 95% of your use-cases and reduce the amount of coding and maintenance efforts.
pfSense, TrueNAS, ‘mailcow: dockerized’ and Nextcloud VM are free and open-source software that turn generic hardware and VMs into computer appliances. We use them at home for the most important or complex components of our architecture.
Here are benefits of using appliances based on free and open-source software and generic hardware:
- Software is maintained by the open-source community and self-updates automatically.
- Restoring an appliance from scratch is easy.
- Hardware can be replaced by purchasing parts online.
- The configuration can be exported and backed up through the management UI as a downloadable compressed XML file or some other convenient format.
- The system can be reinstalled from scratch from a downloadable installation media.
- Configuration can be restored or rolled back through the management UI by uploading a config backup.
When I make a configuration change on hardware appliances (pfSense and TrueNAS),
I export the config and save it on my laptop in a directory synchronized across devices.
This is like hitting Ctrl + S
.
Appliances running on VMs are backed up using either TrueNAS rsync tasks in PULL mode or Borgmatic.
Domain registrar
I like nearlyfreespeech.net because they offer a barebones service at a very low profit margin. Prices are the lowest and it works well. I admire their ethics and their business model.
DNS
I use DigitalOcean for my personal projects. Their DNS service is free. The pfSense router has an integration with DigitalOcean API for dynamic DNS.
Nextcloud uses SMTP for user enrollment, self-serve password reset and file sharing by email among other things. I’ve been running a mailcow: dockerized VM appliance for mail self-hosting. To avoid the problem of public cloud hosting IP address blacklisting, I use Amazon SES for SMTP relay. It’s cheap and super simple to setup.
Reverse proxy with SSL offloading
My home router is a pfSense system on Netgate hardware.
I install the HAProxy and Acme Certificates services from the Package Manager (packages haproxy
and acme
).
SSL certificates are provided by Let’s Encrypt.
Monitoring
I run a FreeBSD jail I call Watchtower with components of the Grafana+Prometheus observability stack.
Storage architecture
Our production TrueNAS server runs in my home,
while a backup TrueNAS server runs in my brother’s home.
Our sites are connected with a VPN using OpenVPN.
Conveniently, Nextcloud VM uses ZFS for the /mnt/ncdata
filesystem.
We configured the PULL replication task using the /mnt/ncdata
dataset in the VM as the source, instead of the ZVOL on the TrueNAS host.
This provides off-site backups while keeping the backup size as small as possible.
I have also been experimenting with block storage encryption with LUKS, using Borgmatic instead of ZFS replication for backups, providing encryption-at-rest of both data and backups. In this section, I describe the architecture with encryption-at-rest, although we do run a Nextcloud server without encrypted storage for our main family instance.
Why LUKS encryption?
https://wiki.archlinux.org/title/dm-crypt
- Activating server-side encryption in Nextcloud increases file size by 35% (source)
- Futhermore, the encryption key is in
data/<user>/files_encryption
so it doesn’t protect against physical access to the storage. - It is mostly useful for external storage (S3, NFS, etc.)
- End-to-end encryption is not ready yet (as of April 2021).
Encryption key management
The LUKS encryption key is unlocked with a passphrase that must be manually entered on a terminal.
The passphrased is saved in a password manager. I use pass, the standard unix password manager.
In case the admin’s laptop is wiped, the encrypted filesystems can be accessed from a rescue environment. This environment is hosted in a TrueNAS VM which has ssh, git, pass and borg installed.
Backup strategy with borgmatic
- To save space on the backup TrueNAS box, we prefer to replicate the filesystem from inside the VM instead of replicating the ZVOL on the host.
- A borg FreeBSD jail provides filesystem-level encrypted backups.
- Borgmatic helps with the borg client configuration on the VM.
Off site backup with TrueNAS
The borg backup dataset is replicated to an off site TrueNAS backup server.
The two sites are connected with an OpenVPN layer 4 tunnel.
Restoring files as an admin
The admin has the passphrases to the borg backup repo and ncdata
LUKS container.
To access the ZFS snapshot on the ncdata
dataset in the Nextcloud VM:
sudo ls /mnt/ncdata/.zfs/snapshot
To access the borg archives on the Nextcloud VM or the admin’s workstation, or the admin rescue VM:
sudo borg mount user@borghost:repo
VM specifications
- 2 vCPUs
- 4 GB memory
- 50 GB root ZVOL
- x TB data ZVOL
Preparation
- Create two ZVOLS in TrueNAS.
/mnt/pool-01/virtual-machines/nextcloud/root
(50 GB)/mnt/pool-01/virtual-machines/nextcloud/ncdata
(x TB)
- Create the VM in TrueNAS and attach the ZVOLs.
- Attach the Ubuntu 20.04 LTS installation media, boot the VM, and install the OS (1 hour).
- Follow the Nextcloud VM installation instructions and interactive script (1 hour).
- Log into Nextcloud VM over SSH.
- Re-create the
ncdata
ZFS dataset in a LUKS container.- Move the content of
/mnt/ncdata
to/root/ncdata
. - Run
cryptsetup luksFormat
on/dev/sdb
. - Open the container with
cryptsetup open
. - Create a ZPOOL on
/dev/mapper/ncdata
. - Mount the root dataset on
/mnt/ncdata
. - Install the
open-ncdata
script in the home directory of a normal user. - Install Borgmatic, the Borgmatic configuration, the
borgmatic.service
andborgmatic.timer
systemd unit files.
- Move the content of
Dead man switch
At the end of the nextcloud_update.sh
cron job, the VM is restarted to apply updates.
The VM starts, but the ncdata
filesystem is encrypted and the LUKS container can only be decrypted with a passphrase.
This is a feature, not a bug. 😄
It prevents the data from being accessed without knowledge of the passphrase. It also ensure an auto-locking of the storage when the Nextcloud application is inactive and not needed.
A shell script semi-automates the actions required from the system administrator or owner.
On my workstation, I have open-ncdata
in my PATH:
#!/bin/bash
. colors.sh
function log() {
echo -e "${txtpur}${@}${txtrst}"
}
log Running open-ncdata
log Opening your password manager and copying the nextcloud-vm password to the clipboard.
pass home/nextcloud-vm/alex -c
log Entering nextcloud-alex.
ssh -t nextcloud-alex ./open-ncdata
log You left nextcloud-alex.
On the Nextcloud VM, in my user’s home directory, there is another open-ncdata
:
#!/bin/bash
##### BEGIN colors.sh
erasePreviousLine () {
eraseline='\r\033[K'
echo -ne "$eraseline"
}
# Special characters
xmark="\xE2\x9C\x97"
chkmrk="\xE2\x9C\x93"
# Colors
endcolor='\033[0m'
white='\033[1;37m'
gray='\033[0;37m'
red='\033[1;31m'
green='\033[1;32m'
txtblk='\033[0;30m' # Black - Regular
txtred='\033[0;31m' # Red
txtgrn='\033[0;32m' # Green
txtylw='\033[0;33m' # Yellow
txtblu='\033[0;34m' # Blue
txtpur='\033[0;35m' # Purple
txtcyn='\033[0;36m' # Cyan
txtwht='\033[0;37m' # White
bldblk='\033[1;30m' # Black - Bold
bldred='\033[1;31m' # Red
bldgrn='\033[1;32m' # Green
bldylw='\033[1;33m' # Yellow
bldblu='\033[1;34m' # Blue
bldpur='\033[1;35m' # Purple
bldcyn='\033[1;36m' # Cyan
bldwht='\033[1;37m' # White
unkblk='\033[4;30m' # Black - Underline
undred='\033[4;31m' # Red
undgrn='\033[4;32m' # Green
undylw='\033[4;33m' # Yellow
undblu='\033[4;34m' # Blue
undpur='\033[4;35m' # Purple
undcyn='\033[4;36m' # Cyan
undwht='\033[4;37m' # White
bakblk='\033[40m' # Black - Background
bakred='\033[41m' # Red
bakgrn='\033[42m' # Green
bakylw='\033[43m' # Yellow
bakblu='\033[44m' # Blue
bakpur='\033[45m' # Purple
bakcyn='\033[46m' # Cyan
bakwht='\033[47m' # White
txtrst='\033[0m' # Text Reset
##### END colors.sh
function log () {
echo -e "\n${bakblu}||||||${txtrst} ${@}\n"
}
function action () {
echo -e "${bakylw}||||||${txtrst} → ${@} ←\n"
}
function test_result_ok () {
echo -e "Test ${txtgrn}||| OK |||${txtrst}\n"
}
function test_result_failed () {
echo -e "Test ${txtred}||| FAILED |||${txtrst}\n"
}
function test_nextcloud() {
curl -k -s -f https://nextcloud-vm.example.com/status.php \
&& echo
}
log "Testing Nextcloud staus..."
if test_nextcloud; then
test_result_ok
log Exiting.
exit
else
test_result_failed
action "Press Ctrl+V to paste sudo password."
fi
log "Decrypting data volume..."
action "Type the LUKS container key passphrase for /dev/sdb."
[ ! -e /dev/mapper/ncdata ] && sudo cryptsetup open /dev/sdb ncdata
log "Mounting ZFS dataset on /mnt/ncdata..."
sudo bash -c '[ ! -e /mnt/ncdata/.ocdata ]' \
&& sudo zpool import ncdata \
&& zfs set mountpoint=/mnt/ncdata ncdata
log "Testing Nextcloud staus..."
if test_nextcloud; then
test_result_ok
else
test_result_failed
fi
Monitoring
Metrics are collected in a Prometheus monitoring system. Logs are aggregated with Loki and Promtail. Logs and metrics are visualized in Grafana.
All the components of the monitoring stack run in a FreeBSD Jail.
node_exporter
is installed on the Nextcloud VM.
I haven’t done the Promtail component yet, but it will be installed on the VM.
sudo apt install prometheus-node-exporter
On the monitoring FreeBSD Jail, the prometheus.yml
config file contains:
alerting:
alertmanagers:
- static_configs:
- targets:
- alertmanager:9093
rule_files:
- /usr/local/etc/prometheus_alerts.yml
scrape_configs:
- job_name: websites
metrics_path: /probe
params:
module: [http_2xx]
static_configs:
- targets:
- https://nextcloud.example.com/status.php
relabel_configs:
- source_labels: [__address__]
target_label: __param_target
- source_labels: [__param_module]
target_label: module
- source_labels: [__param_target]
target_label: instance
- target_label: __address__
replacement: blackbox_exporter.infra.example.com:9115
prometheus_alerts.yml
:
- name: websites
rules:
- alert: HTTP code not 200
expr: probe_http_status_code{module="http_2xx"} != 200
for: 30m
annotations:
dashboard_url: https://grafana.example.com/d/000000021/blackbox-exporter-probes
blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/
- alert: Blackbox probing failed
expr: probe_success != 1
for: 30m
annotations:
dashboard_url: https://grafana.deverteuil.net/d/000000021/blackbox-exporter-probes
blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/
- alert: SSL certificate is overdue for renewal
expr: (probe_ssl_earliest_cert_expiry - time()) / (24 * 3600) < 14
annotations:
summary: This SSL certificate will expire within 14 days.
dashboard_url: https://grafana.deverteuil.net/d/000000021/blackbox-exporter-probes
blackbox_logs_url: http://blackbox_exporter.infra.example.com:9115/
Alerts and manual intervention
At the end of the weekly nextcloud_update.sh
cron job, the VM is restarted automatically.
Because the ncdata
LUKS container is not added to crypttab
, the VM boots up but the volume remains encrypted.
- Nextcloud will return 500.
- Blackbox HTTP checks fail because
/status.php
returns 500 status code. - Alert is sent to the owner as a reminder to run
open-ncdata
.
When the open-ncata
script is run, the Nextcloud service resumes immediately and an Alert RESOLVED notification is sent.
Takeaways
Your self-hosted application can only be as solid as your self-hosting infrastructure.
A solid infrastructure should provide:
- A domain name
- Public DNS
- DNS resololver
- Dynamic DNS client
- Virtual machines
- Jails or containers
- Text-based documentation with static website output
- Reliable storage
- DHCP
- Reverse proxy
- SSL offloading
- ACME client
- Mail hosting
- SMTP relay
- Observability (logs, metrics)
- Monitoring (alerts)
- Secure backups (off-site and encrypted)
Use FOSS appliances when possible rather than building your own deployment-as-code for every piece of infrastructure and applications. TrueNAS and pfSense are great open-source infrastructure appliances you can install on generic hardware. Nextcloud VM is a simple solution for deploying and configuring Nextcloud as an appliance.
Partner with a techie friend or family member living at a different address. Share the setup and maintenance workload, and use each others hardware for off-site backups. Also share learning experiences! ✨
Next steps
- Write an article on how I monitor TrueNAS with Prometheus, Promtail, rsyslog and
graphite_exporter
. - Continue experimenting and improving the encryption-at-rest feature.
- Teach my girlfriend how to use Nextcloud and how to organize pictures 😄.
Discussion
I posted about this article on /r/selfhosted:
https://www.reddit.com/r/selfhosted/comments/nkzms1/nextcloud_selfhosted_architecture/