Skip to content

Backups and Pool Health

Storage for the whole homelab lives on a RAIDZ1 pool on the Ugreen DXP 8800 Plus. If the pool is unhealthy, everything stops — Jellyfin, photos, *arr imports, SMB shares. This page is the runbook for keeping it alive and knowing what to do when it is not.

Pool layout

Setting Value
Platform Ugreen DXP 8800 Plus @ 192.168.2.203
OS TrueNAS
Pool type RAIDZ1 (single parity)
Drives today 7× 16 TB
Planned 8× 16 TB (8th drive ordered / pending install)
Admin UI dsm.saxobroko.comStorage

RAIDZ1 can survive one drive failure without data loss. A second failed drive before rebuild completes means data loss — treat degraded status as urgent.

Usable capacity is less than raw drive total (parity + metadata overhead). Check the TrueNAS dashboard for live numbers; do not rely on back-of-envelope maths.

What lives on the pool

Dataset / use Consumers
Media (Shows/, Movies/, music) Jellyfin, Navidrome, *arr stack — arr-stack
Photos photos.saxobroko.comPhotos
App configs Docker volumes for Sonarr, Radarr, qBittorrent, etc.
SMB shares Windows A: drive — Windows network drive
Backups / snapshots ZFS snapshots (see below)

Scrubs

ZFS scrubs read every block and verify checksums. They catch silent bit rot before it becomes unrecoverable.

Task Guidance
Schedule Monthly scrub on the pool (TrueNAS default or custom schedule)
Duration Hours on multi-TB RAIDZ1 — run off-peak if performance matters
During scrub Pool stays online; may be slower
Check status TrueNAS → Storage → pool → Scrub tab

If a scrub reports checksum errors, note the drive and run SMART tests on suspect disks before replacing.

SMART monitoring

TrueNAS surfaces drive SMART data:

  1. StorageDisks — review temperature and SMART status
  2. Enable S.M.A.R.T. Tests (short monthly, long quarterly) if not already scheduled
  3. Replace a drive showing reallocated sectors, pending sectors, or critical warnings — do not wait for total failure

Email or notification hooks for SMART failures: TODO: document alert destination if configured.

Snapshots mindset

ZFS snapshots are cheap point-in-time copies on the same pool. They help with:

  • Accidental deletes (rollback a dataset or recover files from snapshot browser)
  • Before risky changes (app upgrades, bulk renames)
  • Quick recovery without a second copy of the data

Snapshots are not a substitute for off-site backup — fire, theft, or multiple drive loss still kills the pool.

Practice Notes
Snapshot important datasets Media, photos, app config volumes
Retention e.g. daily × 7, weekly × 4 — tune to pool space
Off-site TODO: document if cloud or external USB backup exists

8th drive expansion

Plan when the 8th 16 TB drive arrives:

  1. Physically install the drive in the DXP 8800 Plus (power down if hot-swap is not confirmed for that bay)
  2. TrueNASStorage → pool → Expand (or replace/expand wizard depending on UI version)
  3. RAIDZ1 widen — pool capacity increases; parity layout stays one disk
  4. Verify scrub passes after expansion
  5. Update SaxDocsTrueNAS, Server overview hardware section

Before expanding, ensure no drive is already degraded. Fix or replace failing disks first.

Degraded pool — what to do

If TrueNAS shows DEGRADED or a drive is FAULTED:

Immediate steps

  1. Stop heavy writes — pause qBittorrent imports, large photo uploads if possible
  2. Identify the failed drive — Storage → pool status; note serial and slot
  3. Do not yank the wrong disk — confirm by slot LED or serial in UI
  4. Replace the drive — same or larger capacity; TrueNAS will resilver

During resilver

  • Pool remains usable but slow; avoid unnecessary load
  • Resilver time scales with pool used capacity — may take a day or more on 7×16 TB
  • Monitor progress in the UI; do not reboot unless instructed

After resilver

  1. Run a manual scrub
  2. Check SMART on the replacement drive
  3. Review any checksum errors logged during the incident

If two drives fail

RAIDZ1 cannot recover. Restore from off-site backup if it exists. Otherwise this is a data-loss event — TODO: confirm backup strategy and document restore steps here.

Windows and SMB health

Degraded pool performance shows up as slow A: drive or failed copies from the PC @ 192.168.2.203. If SMB is fine on LAN but apps fail, the problem may be Docker — see Common Issues.

Monitoring

External uptime checks: status.saxobroko.comMonitoring. That tells me if services respond; it does not replace pool health checks inside TrueNAS.