Backups and Pool Health
Storage for the whole homelab lives on a RAIDZ1 pool on the Ugreen DXP 8800 Plus. If the pool is unhealthy, everything stops — Jellyfin, photos, *arr imports, SMB shares. This page is the runbook for keeping it alive and knowing what to do when it is not.
Pool layout
| Setting | Value |
|---|---|
| Platform | Ugreen DXP 8800 Plus @ 192.168.2.203 |
| OS | TrueNAS |
| Pool type | RAIDZ1 (single parity) |
| Drives today | 7× 16 TB |
| Planned | 8× 16 TB (8th drive ordered / pending install) |
| Admin UI | dsm.saxobroko.com → Storage |
RAIDZ1 can survive one drive failure without data loss. A second failed drive before rebuild completes means data loss — treat degraded status as urgent.
Usable capacity is less than raw drive total (parity + metadata overhead). Check the TrueNAS dashboard for live numbers; do not rely on back-of-envelope maths.
What lives on the pool
| Dataset / use | Consumers |
|---|---|
Media (Shows/, Movies/, music) |
Jellyfin, Navidrome, *arr stack — arr-stack |
| Photos | photos.saxobroko.com — Photos |
| App configs | Docker volumes for Sonarr, Radarr, qBittorrent, etc. |
| SMB shares | Windows A: drive — Windows network drive |
| Backups / snapshots | ZFS snapshots (see below) |
Scrubs
ZFS scrubs read every block and verify checksums. They catch silent bit rot before it becomes unrecoverable.
| Task | Guidance |
|---|---|
| Schedule | Monthly scrub on the pool (TrueNAS default or custom schedule) |
| Duration | Hours on multi-TB RAIDZ1 — run off-peak if performance matters |
| During scrub | Pool stays online; may be slower |
| Check status | TrueNAS → Storage → pool → Scrub tab |
If a scrub reports checksum errors, note the drive and run SMART tests on suspect disks before replacing.
SMART monitoring
TrueNAS surfaces drive SMART data:
- Storage → Disks — review temperature and SMART status
- Enable S.M.A.R.T. Tests (short monthly, long quarterly) if not already scheduled
- Replace a drive showing reallocated sectors, pending sectors, or critical warnings — do not wait for total failure
Email or notification hooks for SMART failures: TODO: document alert destination if configured.
Snapshots mindset
ZFS snapshots are cheap point-in-time copies on the same pool. They help with:
- Accidental deletes (rollback a dataset or recover files from snapshot browser)
- Before risky changes (app upgrades, bulk renames)
- Quick recovery without a second copy of the data
Snapshots are not a substitute for off-site backup — fire, theft, or multiple drive loss still kills the pool.
| Practice | Notes |
|---|---|
| Snapshot important datasets | Media, photos, app config volumes |
| Retention | e.g. daily × 7, weekly × 4 — tune to pool space |
| Off-site | TODO: document if cloud or external USB backup exists |
8th drive expansion
Plan when the 8th 16 TB drive arrives:
- Physically install the drive in the DXP 8800 Plus (power down if hot-swap is not confirmed for that bay)
- TrueNAS → Storage → pool → Expand (or replace/expand wizard depending on UI version)
- RAIDZ1 widen — pool capacity increases; parity layout stays one disk
- Verify scrub passes after expansion
- Update SaxDocs — TrueNAS, Server overview hardware section
Before expanding, ensure no drive is already degraded. Fix or replace failing disks first.
Degraded pool — what to do
If TrueNAS shows DEGRADED or a drive is FAULTED:
Immediate steps
- Stop heavy writes — pause qBittorrent imports, large photo uploads if possible
- Identify the failed drive — Storage → pool status; note serial and slot
- Do not yank the wrong disk — confirm by slot LED or serial in UI
- Replace the drive — same or larger capacity; TrueNAS will resilver
During resilver
- Pool remains usable but slow; avoid unnecessary load
- Resilver time scales with pool used capacity — may take a day or more on 7×16 TB
- Monitor progress in the UI; do not reboot unless instructed
After resilver
- Run a manual scrub
- Check SMART on the replacement drive
- Review any checksum errors logged during the incident
If two drives fail
RAIDZ1 cannot recover. Restore from off-site backup if it exists. Otherwise this is a data-loss event — TODO: confirm backup strategy and document restore steps here.
Windows and SMB health
Degraded pool performance shows up as slow A: drive or failed copies from the PC @ 192.168.2.203. If SMB is fine on LAN but apps fail, the problem may be Docker — see Common Issues.
Monitoring
External uptime checks: status.saxobroko.com — Monitoring. That tells me if services respond; it does not replace pool health checks inside TrueNAS.
Related
- TrueNAS — admin access and app host
- arr-stack — media paths on the pool
- Photos — photo storage
- Common Issues — service-level problems
- Monitoring — external uptime