I got 4 20TB drives from Amazon around Black Friday that I want to get setup for network storage. I’ve got 3 descent Ryzen 5000 series desktops that I was thinking about setting up so that I could build my own mini-Kubernetes cluster, but I don’t know if I have enough motivation. I’m pretty OCD so small projects often turn into big projects.
I don’t have an ECC motherboard though, so I want to get some input if BTRFS, ZFS, TrueNAS, or some other solution should be relatively safe without it? I guess it is a risk-factor but I haven’t had any issues yet (fingers crossed). I’ve been out of the CNCF space for a while but Rook used to be the way to go for Ceph on Kubernetes. Has there been any new projects worth checking out or should I just do RAID and get it over with? Does Ceph offer the same level of redundancy or performance? The boards have a single M.2 slot so I could add in some SSD caching.
If I go with RAID, should I do RAID 5 or 6? I’m also a bit worried because the drives are all the same so if there is an issue it could hit multiple drives at once, but I plan to try to have an online backup somewhere and if I order more drives I’ll balance it out with a different manufacturer.
Based on the hardware you have I would go with ZFS (using TrueNAS would probably be easiest). Generally with such large disks I would suggest using at least 2 parity disks, but seeing as you only have 4 that means you would lose half your storage to be able to survive 2 disk failures. Reason for the (at least) 2 parity disks is (especially with identical disks) the risk of failure during a rebuild after one failure is pretty high since there is so much data to rebuild and write to your new disk (like, it will probably take more than a day).
Can’t talk much about backup as I just have very little data that I care enough about to backup, and just throw that into cloud object storage as well as onto my local high-reliability storage.
I have tried many different solutions, so will give you a quick overview of my experiences, thoughts, and things I have heard/seen:
Single Machine
Do this unless you have to scale beyond one machine
ZFS (on TrueNAS)
FreeNASTrueNAS GUI.MDADM
BTRFS
UnRaid
Raid card
JBOD
Multi-machine
Ceph
SeaweedFS
Tahoe LAFS
MooseFS/LizardFS
Gluster
Kubernetes-native
Consider these if you’re using k8s. You can use any of the single-machine options (or most of the multi-machine options) and find a way to use them in k8s (natively for gluster and some others, or via NFS). I had a lot of luck just using NFS from my TrueNAS storage server in my k8s cluster.
Rook
Longhorn
OpenEBS