Yes it’s a mouthful, but in my opinion, mhddfs is far-and-away the most beautiful and elegant solution for large data storage. It has taken me 10+ years of searching and trying, but now I’m finally at peace with my home-cooked NAS setup. In this article, I will explain how you too can have large amounts of easily expandable and redundant storage available on your network for the cheapest price in the simplest way possible.
1. The Beginning: RAID
When I realised I had enough data and devices to justify a server, the natural option for storage was of course RAID. As I was a cheapskate (and I didn’t want to risk hardware failure), I used software raid level 5 on Gentoo with 5 drives. Although this worked, it was a pain and I didn’t sleep well:
- If any drive died (which several did), I would have to re-learn the commands to remove the drive from the raid, shut-down the server, install a replacement drive, re-learn the commands to add the drive back to the raid and then wait nervously for the re-sync to complete, which usually took several days due to the size of my data. This was a horrible process because it happened just in-frequently enough that I never got enough practice doing it, so every time it was a matter of googling and praying.
- Expanding the array when space ran out was a similarly in-frequent task that also required some re-learning each time and hence was a nerve-racking exercise. In addition, drive size and type ideally had to match the existing drives, which was a risk to source.
- Since I was using RAID 5, if 2 drives died, BAM, all data was gone. This was always on my mind, and made point (1) even more stressful. Yes I could have used other RAID levels, but 5 was the right balance between speed and redundancy each time I weighed up the options.
2. The Middle: Greyhole
When building a home server, linux is usually the best choice, but getting your network set up right does require some linux know how, and when you start trying to configure firewall, you better hope you read the right blog or forum posts or who knows whether you got it right. A few years ago I found out about Amahi, which is kind of a pre-packaged home server (based on Fedora, and now Ubuntu) that automatically sets you up a computer with everything a typical home server would normally need, out-of-the-box. And it gives you a great web-based dashboard / control panel that allows you to further configure and monitor your system. But mostly it just works, and I’m still using it today.
What especially interested me is that it is bundled with a thing called Greyhole, which is used to provide data storage via Samba for network clients. Greyhole is great in concept in that it allows you to take a bunch of disks, of ANY size and format and location (local or network) and logically combine all their storage capacity together to create a single larger storage which clients see as a single volume. Unfortunately, the implementation appears to be severly flawed, as I found the hard-way after using greyhole for about 6 months. Greyhole works by subscribing to writes/renames/deletes to the samba share, which it then records in a SQL database. Then later, it ‘processes’ those actions by spreading files out across different physical drives that are part of the storage pool you have created. Depending on how you configured redudancy in your pool, your files might end up on one, two, three or all physical drives. This is great in that you get quite good redundancy, you can easily expand the storage pool with any new disk you have lying around, and if any drive dies, you only lose the files on that drive, since individual files are not split across multiple drives. The problem comes when you have a large number of small files and/or you perform a lot of operations on your file system which greyhole just can’t keep up with. This results in greyhole basically falling behind on its tasks which starts to result in your files not getting copied / moved to the right places and in worst case, actually going missing (yes this happened to me). Finally, greyhole filled up my entire dropzone with millions of tiny log files which killed my server completed after I ran out of inodes. At that point I was done with greyhole.
3. The End: MHDDFS
Finally after googling again, I saw mention of a small linux util, mhddfs, that seemed like it might just fit the bill. It is not heavily advertised, which is risky when dealing with file-systems, but I’ve been using it for 2+ years and it has performed beautifully (zero data loss). There is only one blog post that explains how it works, and I will not repeat that here, so you should read this: Intro to MHDDFS.
Once you’ve read that, you’ll see it’s a simple matter of running a single linux command (or editing your fstab) to create your storage pool on boot up of your server. Once created, you can simply share out your pool as a samba share for your network, and MHDDFS will take care of ensuring when one drive in the pool is full, it will seamlessly start writing to the next drive. So clients just see one huge volume with lots of available space. Adding drives is as simple as editing your fstab, and you can pull out a drive at any time and access all the files on it directly (since you can choose your own file system). Files are not split across drives.
Since MHDDFS is a fuse-based file system (i.e. it is running in user-space), you may question its performance. I tested read/write speeds over a Gigabit network to locations on in the storage pool and out of it, and can confirm we are talking about a very small performance degredation, something like 5% slower, which for me was more than acceptable.
MHDDFS does not provide any redundancy feature, which is actually nice, since it does one job and does it well. This leaves you with lots of options to choose your own redundancy solution. Mine was simply to have a backup computer with the same storage capacity, and use MHDDFS on that computer to create a ‘backup’ mirror storage pool. Then I simply use rsync as a nightly scheduled task to keep the two pools in sync.
Here’s my relevant fstab entrys:
UUID=60933834-6e2e-snip /mnt/mediaA ext4 defaults 1 2
UUID=a21d2e76-e58b-snip /mnt/mediaB ext4 defaults 1 2
UUID=e53b4fef-600e-snip /mnt/mediaC ext4 defaults 1 2
UUID=b94100c4-2926-snip /mnt/mediaD ext4 defaults 1 2
UUID=a10c3249-ae19-snip /mnt/mediaE ext4 defaults 1 2
UUID=4309390b-399f-snip /mnt/mediaF ext4 defaults 1 2
mhddfs#/mnt/mediaA,/mnt/mediaB,/mnt/mediaC,/mnt/mediaD,/mnt/mediaE,/mnt/mediaF /mnt/media fuse nonempty,allow_other 0 0
So /mnt/media becomes my storage pool share, which you can see easily using df -h:
$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 1.8T 1.7T 18G 99% /mnt/mediaA
/dev/sdb1 1.8T 1.7T 18G 99% /mnt/mediaB
/dev/sdc1 1.8T 1.7T 11G 100% /mnt/mediaC
/dev/sdd1 1.8T 1.7T 12G 100% /mnt/mediaD
/dev/sde1 1.8T 342G 1.4T 20% /mnt/mediaE
/dev/sdf1 1.8T 196M 1.7T 1% /mnt/mediaF
11T 7.1T 3.2T 70% /mnt/media
My rsync command runs as a scheduled task (cron job) at 5:30am every day:
rsync -r -t -v –progress -s -e “ssh -p 1234″ /mnt/media/coding user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234″ /mnt/media/projects user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234″ /mnt/media/graphics user@backup:/mnt/media
Note that the rsync command is executed for each individual root folder in the share so I can choose which folders I want to make redundant. Also I do not include the –delete option so that if I accidently delete something, i can recover it from the backup server at any time. Then periodically I can use Beyond Compare to compare to two storage pools and remove anything I truely don’t need. The first time you set up the backup storage pool, it will take quite a while for the rsync to complete (like several days), but thereafter, it is amazingly quick and finding just the diffs and replicating those only. Yes everything you heard about rsync is true, it is awesome.
So that’s my data storage problem solved, if are looking for something that is scalable, powerful, flexible and most importantly, simple, I recommend mhddfs. And then for redundancy, rsync is about as simple as it gets.