Why I ditched RAID and Greyhole for MHDDFS

Yes it’s a mouthful, but in my opinion, mhddfs is far-and-away the most beautiful and elegant solution for large data storage. It has taken me 10+ years of searching and trying, but now I’m finally at peace with my home-cooked NAS setup. In this article, I will explain how you too can have large amounts of easily expandable and redundant storage available on your network for the cheapest price in the simplest way possible.

1. The Beginning: RAID

When I realised I had enough data and devices to justify a server, the natural option for storage was of course RAID. As I was a cheapskate (and I didn’t want to risk hardware failure), I used software raid level 5 on Gentoo with 5 drives. Although this worked, it was a pain and I didn’t sleep well:

  • If any drive died (which several did), I would have to re-learn the commands to remove the drive from the raid, shut-down the server, install a replacement drive, re-learn the commands to add the drive back to the raid and then wait nervously for the re-sync to complete, which usually took several days due to the size of my data.  This was a horrible process because it happened just in-frequently enough that I never got enough practice doing it, so every time it was a matter of googling and praying.
  • Expanding the array when space ran out was a similarly in-frequent task that also required some re-learning each time and hence was a nerve-racking exercise.  In addition, drive size and type ideally had to match the existing drives, which was a risk to source.
  • Since I was using RAID 5, if 2 drives died, BAM, all data was gone.  This was always on my mind, and made point (1) even more stressful.  Yes I could have used other RAID levels, but 5 was the right balance between speed and redundancy each time I weighed up the options.

2. The Middle: Greyhole

When building a home server, linux is usually the best choice, but getting your network set up right does require some linux know how, and when you start trying to configure firewall, you better hope you read the right blog or forum posts or who knows whether you got it right.  A few years ago I found out about Amahi, which is kind of a pre-packaged home server (based on Fedora, and now Ubuntu) that automatically sets you up a computer with everything a typical home server would normally need, out-of-the-box.  And it gives you a great web-based dashboard / control panel that allows you to further configure and monitor your system.  But mostly it just works, and I’m still using it today.

What especially interested me is that it is bundled with a thing called Greyhole, which is used to provide data storage via Samba for network clients.  Greyhole is great in concept in that it allows you to take a bunch of disks, of ANY size and format and location (local or network) and logically combine all their storage capacity together to create a single larger storage which clients see as a single volume.  Unfortunately, the implementation appears to be severly flawed, as I found the hard-way after using greyhole for about 6 months.  Greyhole works by subscribing to writes/renames/deletes to the samba share, which it then records in a SQL database.  Then later, it ‘processes’ those actions by spreading files out across different physical drives that are part of the storage pool you have created.  Depending on how you configured redudancy in your pool, your files might end up on one, two, three or all physical drives.  This is great in that you get quite good redundancy, you can easily expand the storage pool with any new disk you have lying around, and if any drive dies, you only lose the files on that drive, since individual files are not split across multiple drives.  The problem comes when you have a large number of small files and/or you perform a lot of operations on your file system which greyhole just can’t keep up with.  This results in greyhole basically falling behind on its tasks which starts to result in your files not getting copied / moved to the right places and in worst case, actually going missing (yes this happened to me).  Finally, greyhole filled up my entire dropzone with millions of tiny log files which killed my server completed after I ran out of inodes.  At that point I was done with greyhole.

3. The End: MHDDFS

Finally after googling again, I saw mention of a small linux util, mhddfs, that seemed like it might just fit the bill.  It is not heavily advertised, which is risky when dealing with file-systems, but I’ve been using it for 2+ years and it has performed beautifully (zero data loss).  There is only one blog post that explains how it works, and I will not repeat that here, so you should read this: Intro to MHDDFS.

Once you’ve read that, you’ll see it’s a simple matter of running a single linux command (or editing your fstab) to create your storage pool on boot up of your server.  Once created, you can simply share out your pool as a samba share for your network, and MHDDFS will take care of ensuring when one drive in the pool is full, it will seamlessly start writing to the next drive.  So clients just see one huge volume with lots of available space.  Adding drives is as simple as editing your fstab, and you can pull out a drive at any time and access all the files on it directly (since you can choose your own file system).  Files are not split across drives.

Performance

Since MHDDFS is a fuse-based file system (i.e. it is running in user-space), you may question its performance.  I tested read/write speeds over a Gigabit network to locations on in the storage pool and out of it, and can confirm we are talking about a very small performance degredation, something like 5% slower, which for me was more than acceptable.

Redundancy

MHDDFS does not provide any redundancy feature, which is actually nice, since it does one job and does it well.  This leaves you with lots of options to choose your own redundancy solution.  Mine was simply to have a backup computer with the same storage capacity, and use MHDDFS on that computer to create a ‘backup’ mirror storage pool.  Then I simply use rsync as a nightly scheduled task to keep the two pools in sync.

Configuration

Here’s my relevant fstab entrys:

UUID=60933834-6e2e-snip /mnt/mediaA ext4    defaults        1 2
UUID=a21d2e76-e58b-snip /mnt/mediaB ext4    defaults        1 2
UUID=e53b4fef-600e-snip /mnt/mediaC ext4    defaults        1 2
UUID=b94100c4-2926-snip /mnt/mediaD ext4    defaults        1 2
UUID=a10c3249-ae19-snip /mnt/mediaE ext4    defaults        1 2
UUID=4309390b-399f-snip /mnt/mediaF ext4    defaults        1 2
mhddfs#/mnt/mediaA,/mnt/mediaB,/mnt/mediaC,/mnt/mediaD,/mnt/mediaE,/mnt/mediaF /mnt/media fuse nonempty,allow_other 0 0

So /mnt/media becomes my storage pool share, which you can see easily using df -h:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             1.8T  1.7T   18G  99% /mnt/mediaA
/dev/sdb1             1.8T  1.7T   18G  99% /mnt/mediaB
/dev/sdc1             1.8T  1.7T   11G 100% /mnt/mediaC
/dev/sdd1             1.8T  1.7T   12G 100% /mnt/mediaD
/dev/sde1             1.8T  342G  1.4T  20% /mnt/mediaE
/dev/sdf1             1.8T  196M  1.7T   1% /mnt/mediaF
/mnt/mediaA;/mnt/mediaB;/mnt/mediaC;/mnt/mediaD;/mnt/mediaE;/mnt/mediaF
11T  7.1T  3.2T  70% /mnt/media

My rsync command runs as a scheduled task (cron job) at 5:30am every day:

rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/coding user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/projects user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/graphics user@backup:/mnt/media

Note that the rsync command is executed for each individual root folder in the share so I can choose which folders I want to make redundant.  Also I do not include the –delete option so that if I accidently delete something, i can recover it from the backup server at any time.  Then periodically I can use Beyond Compare to compare to two storage pools and remove anything I truely don’t need.  The first time you set up the backup storage pool, it will take quite a while for the rsync to complete (like several days), but thereafter, it is amazingly quick and finding just the diffs and replicating those only.  Yes everything you heard about rsync is true, it is awesome.

So that’s my data storage problem solved, if are looking for something that is scalable, powerful, flexible and most importantly, simple, I recommend mhddfs.  And then for redundancy, rsync is about as simple as it gets.

UPDATE:
If you are seeing a ‘transport endpoing not connected’ error randomly with your mhddfs storage pool, you’ll want to install this forked version:

https://github.com/vdudouyt/mhddfs-nosegfault

Hopefully this is fixed in the maintainer’s version soon!

3 thoughts on “Why I ditched RAID and Greyhole for MHDDFS

  1. JD says:

    To simplify your rsync commands – learn about the ~/.ssh/config file. You’ll thank me and every tool that uses ssh will use this. No more specifying userids, ports, or remember crap IPs or odd AWS hostnames.

  2. Hi!

    Sorry for posting an unrelated comment, but the original post had closed its comment section. I read this post: http://zornsoftware.codenature.info/blog/why-i-ditched-raid-and-greyhole-for-mhddfs.html

    Very informative, but you mention that one of the reasons you moved away from RAID was the complexity in case a disk failed. However, I didn’t see anything about what commands you’d run in the case where your mhddfs setup would lose a physical disk.

    Is it simply a matter of inserting a new disk, and then running an “inverse” rsync to fill in whatever files are missing, because the file structure is intact?

    It seems that without rsync it could be a bit of a mess to lose a disk, because even though data is not striped across disks (leaving you with “complete” files), the directory structure could still be spread any number of ways, and you’d never know without rsync where in your hierarchy things disappeared when a physical disk goes down, right?

    Thanks for a great blog!

    • admin says:

      Hi Linus, I’ve moved your comment to the correct post now. In answer to your question, you are right, rsync is a great way to handle a disk failure, the command would actually be the same, just swap source and destination (and run the command on your backup machine). I also use Beyond Compare to do a directory comparison, which handles binary comparison of huge file systems (mine is 11Tb+) and enables you to sync between source/destination. The thing i like about it is you can choose which tool you want to use based on your level of experience, and aren’t forced to deal with learning RAID recovery, which imho is not much fun.

Comments are closed.