Mime types with Gnome Commander

One of the things that mime types are used for is to specify the default program to associate with a file extension. Unfortunately, this is currently broken with Gnome Commander (at at v1.2.8.17). To fix it, you’ll need to add a line to the following file:

~/.local/share/applications/defaults.list

Create that file if it doesn’t exist and add [Default Applications] on the first line. Then add the mime type (as shown on the gnome commander error dialog when you try to open your file) and specify the program to use. For example, for PNG file extension you might want to use Eye of Gnome (eog) so you’d add:

image/x-apple-ios-png=eog.desktop

mime

Save the file and you’re done. If it still doesn’t work, you should check this thread on the Ubuntu forums. Tested with Zorin OS 9. Hopefully this is fixed in a future version.

Opening files from Gnome Commander in foreground

Gnome Commander is my preferred file commander on linux, it’s simple and it works. However, I usually map F3 to my preferred editor (gedit) rather than use the internal viewer and when you do that, gedit is not brought into the foreground if it is already open. Which is totally lame. Luckily there’s an easy way to fix it. Create a script named e.g. gedit-foreground.sh with the following content:


#!/bin/sh
gedit $1
wmctrl -a gedit

Obviously you can modify this to open any app if you prefer. Save this in your home directory somewhere and then make it executable using:

chmod a+x gedit-foreground.sh

Then install wmctrl using:

sudo apt-get install wmctrl

Now in Gnome Commander go to Settings->Options->Programs and set the viewer to:

/path/to/your/scripts/gedit-foreground.sh %s

Now when you press F3 or whatever your key for ‘open in external viewer’ is, your editor will be launched and be brought to the foreground instantly!

Why I ditched RAID and Greyhole for MHDDFS

Yes it’s a mouthful, but in my opinion, mhddfs is far-and-away the most beautiful and elegant solution for large data storage. It has taken me 10+ years of searching and trying, but now I’m finally at peace with my home-cooked NAS setup. In this article, I will explain how you too can have large amounts of easily expandable and redundant storage available on your network for the cheapest price in the simplest way possible.

1. The Beginning: RAID

When I realised I had enough data and devices to justify a server, the natural option for storage was of course RAID. As I was a cheapskate (and I didn’t want to risk hardware failure), I used software raid level 5 on Gentoo with 5 drives. Although this worked, it was a pain and I didn’t sleep well:

  • If any drive died (which several did), I would have to re-learn the commands to remove the drive from the raid, shut-down the server, install a replacement drive, re-learn the commands to add the drive back to the raid and then wait nervously for the re-sync to complete, which usually took several days due to the size of my data.  This was a horrible process because it happened just in-frequently enough that I never got enough practice doing it, so every time it was a matter of googling and praying.
  • Expanding the array when space ran out was a similarly in-frequent task that also required some re-learning each time and hence was a nerve-racking exercise.  In addition, drive size and type ideally had to match the existing drives, which was a risk to source.
  • Since I was using RAID 5, if 2 drives died, BAM, all data was gone.  This was always on my mind, and made point (1) even more stressful.  Yes I could have used other RAID levels, but 5 was the right balance between speed and redundancy each time I weighed up the options.

2. The Middle: Greyhole

When building a home server, linux is usually the best choice, but getting your network set up right does require some linux know how, and when you start trying to configure firewall, you better hope you read the right blog or forum posts or who knows whether you got it right.  A few years ago I found out about Amahi, which is kind of a pre-packaged home server (based on Fedora, and now Ubuntu) that automatically sets you up a computer with everything a typical home server would normally need, out-of-the-box.  And it gives you a great web-based dashboard / control panel that allows you to further configure and monitor your system.  But mostly it just works, and I’m still using it today.

What especially interested me is that it is bundled with a thing called Greyhole, which is used to provide data storage via Samba for network clients.  Greyhole is great in concept in that it allows you to take a bunch of disks, of ANY size and format and location (local or network) and logically combine all their storage capacity together to create a single larger storage which clients see as a single volume.  Unfortunately, the implementation appears to be severly flawed, as I found the hard-way after using greyhole for about 6 months.  Greyhole works by subscribing to writes/renames/deletes to the samba share, which it then records in a SQL database.  Then later, it ‘processes’ those actions by spreading files out across different physical drives that are part of the storage pool you have created.  Depending on how you configured redudancy in your pool, your files might end up on one, two, three or all physical drives.  This is great in that you get quite good redundancy, you can easily expand the storage pool with any new disk you have lying around, and if any drive dies, you only lose the files on that drive, since individual files are not split across multiple drives.  The problem comes when you have a large number of small files and/or you perform a lot of operations on your file system which greyhole just can’t keep up with.  This results in greyhole basically falling behind on its tasks which starts to result in your files not getting copied / moved to the right places and in worst case, actually going missing (yes this happened to me).  Finally, greyhole filled up my entire dropzone with millions of tiny log files which killed my server completed after I ran out of inodes.  At that point I was done with greyhole.

3. The End: MHDDFS

Finally after googling again, I saw mention of a small linux util, mhddfs, that seemed like it might just fit the bill.  It is not heavily advertised, which is risky when dealing with file-systems, but I’ve been using it for 2+ years and it has performed beautifully (zero data loss).  There is only one blog post that explains how it works, and I will not repeat that here, so you should read this: Intro to MHDDFS.

Once you’ve read that, you’ll see it’s a simple matter of running a single linux command (or editing your fstab) to create your storage pool on boot up of your server.  Once created, you can simply share out your pool as a samba share for your network, and MHDDFS will take care of ensuring when one drive in the pool is full, it will seamlessly start writing to the next drive.  So clients just see one huge volume with lots of available space.  Adding drives is as simple as editing your fstab, and you can pull out a drive at any time and access all the files on it directly (since you can choose your own file system).  Files are not split across drives.

Performance

Since MHDDFS is a fuse-based file system (i.e. it is running in user-space), you may question its performance.  I tested read/write speeds over a Gigabit network to locations on in the storage pool and out of it, and can confirm we are talking about a very small performance degredation, something like 5% slower, which for me was more than acceptable.

Redundancy

MHDDFS does not provide any redundancy feature, which is actually nice, since it does one job and does it well.  This leaves you with lots of options to choose your own redundancy solution.  Mine was simply to have a backup computer with the same storage capacity, and use MHDDFS on that computer to create a ‘backup’ mirror storage pool.  Then I simply use rsync as a nightly scheduled task to keep the two pools in sync.

Configuration

Here’s my relevant fstab entrys:

UUID=60933834-6e2e-snip /mnt/mediaA ext4    defaults        1 2
UUID=a21d2e76-e58b-snip /mnt/mediaB ext4    defaults        1 2
UUID=e53b4fef-600e-snip /mnt/mediaC ext4    defaults        1 2
UUID=b94100c4-2926-snip /mnt/mediaD ext4    defaults        1 2
UUID=a10c3249-ae19-snip /mnt/mediaE ext4    defaults        1 2
UUID=4309390b-399f-snip /mnt/mediaF ext4    defaults        1 2
mhddfs#/mnt/mediaA,/mnt/mediaB,/mnt/mediaC,/mnt/mediaD,/mnt/mediaE,/mnt/mediaF /mnt/media fuse nonempty,allow_other 0 0

So /mnt/media becomes my storage pool share, which you can see easily using df -h:

$ df -h
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             1.8T  1.7T   18G  99% /mnt/mediaA
/dev/sdb1             1.8T  1.7T   18G  99% /mnt/mediaB
/dev/sdc1             1.8T  1.7T   11G 100% /mnt/mediaC
/dev/sdd1             1.8T  1.7T   12G 100% /mnt/mediaD
/dev/sde1             1.8T  342G  1.4T  20% /mnt/mediaE
/dev/sdf1             1.8T  196M  1.7T   1% /mnt/mediaF
/mnt/mediaA;/mnt/mediaB;/mnt/mediaC;/mnt/mediaD;/mnt/mediaE;/mnt/mediaF
11T  7.1T  3.2T  70% /mnt/media

My rsync command runs as a scheduled task (cron job) at 5:30am every day:

rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/coding user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/projects user@backup:/mnt/media
rsync -r -t -v –progress -s -e “ssh -p 1234” /mnt/media/graphics user@backup:/mnt/media

Note that the rsync command is executed for each individual root folder in the share so I can choose which folders I want to make redundant.  Also I do not include the –delete option so that if I accidently delete something, i can recover it from the backup server at any time.  Then periodically I can use Beyond Compare to compare to two storage pools and remove anything I truely don’t need.  The first time you set up the backup storage pool, it will take quite a while for the rsync to complete (like several days), but thereafter, it is amazingly quick and finding just the diffs and replicating those only.  Yes everything you heard about rsync is true, it is awesome.

So that’s my data storage problem solved, if are looking for something that is scalable, powerful, flexible and most importantly, simple, I recommend mhddfs.  And then for redundancy, rsync is about as simple as it gets.

UPDATE:
If you are seeing a ‘transport endpoing not connected’ error randomly with your mhddfs storage pool, you’ll want to install this forked version:

https://github.com/vdudouyt/mhddfs-nosegfault

Hopefully this is fixed in the maintainer’s version soon!

Netgear GS605 v2 is faulty!

I have two of these switches, and I just got stung by a nasty hardware fault that is present in version 2 of this device: when you have two of these devices connected to each other (or just on the same LAN), the throughput for all connections on one of them will drop to a few hundred Kb/s e.g. 400Kb/s! This effectively cripples the LAN which should normally have file transfer speeds of > 25Mb/s.

This has been reported on the Netgear forums here and Netgear have acknowledged the fault. Unfortunately there is no solution other than replace one of the devices with something else (apparently v3 of the GS605 works ok), but I learned a few tricks for how to diagnose network bandwidth issues in the process:

  • Use iperf.exe. This is the standard command line app for testing network bandwidth, and work on both linux and Windows. On linux just sudo apt-get install iperf, on Windows download from here.

    It’s easy to use: on one machine, start a server using:

    iperf -s

    On another start a client duplex test to the server using:

    iperf -c [server ip address] -d

    The results will be printed after a few seconds.

  • Use a good Cat5e or Cat6 ethernet cable. Cables can easily be faulty, check for green lights on your NIC & switch port to indicate 1000Mb/s (gigabit) connection speed.
  • Avoid PCI Gigabit cards (use onboard Gb NIC). PCI but is limited, especially if you have any other devices on the bus.
  • Use a decent file-copy utility. For example, SuperCopier, that instantly shows you your copy speed.

GVFS folder moved from ~/.gvfs

Network shares were previous mounted under a hidden folder in your home directory called gvfs. In recently distros of Ubuntu and those based on it, such as Mint, this directory has now moved to:

/run/user/[username]/gvfs/

and the name of the folder under there for each share is really complicated, like:

smb-share:domain=yourdomain.com,server=hostname,share=files,user=username

which is really not very nice. So you should make a symlink to it like this:

ln -s "/run/user/[username]/gfvs/smb-share:domain=yourdomain.com,server=hostname,share=files,user=username" ~/files_on_hostname

or something similar.

HDD reported as full when it’s not – iNodes

Yes, this can happen, as I just found out – there is something called iNodes in the file-system (in my case ext4), with each file on the partition assigned to an iNode. And each partition has a fixed limit on the number of iNodes.

For example, my 50Gb root partition has 3.2 million iNodes, meaning it can have up to 3.2 million files. Sounds like a lot, but due to one particular program (GreyHole, subject of a future post), I ended up with ~ 3 million files in my /var/spool folder. Once my iNode counter reached max, no new file could be created, with programs reporting ‘Out of disk space’. But df showed me I had plenty of disk space. After I figured out it was iNodes I had run out of, rather than bytes, by using:

df -ih

I knew to look for something that was creating a large number of small files, and once I found the folder with ~3 million files, I just had to delete them. Unfortunately, you can’t just do rm * when there is that many files, I used this solution from stack overflow:

find . -name "*" -maxdepth 1 -print0 | xargs -0 rm

And all was well 🙂

Side node: If your root partition fills up, you may be wondering how to actually perform the since your system probably can’t boot. I’m running Fedora and found that I needed to press ‘e’ repeatedly during the boot sequence which would bring me to the GRUB bootloader. Esapce to cancel any changes, then select your kernel and press ‘e’ again to edit. Then add a space and the word ‘single’ (no quotes) to the end of the line and press Enter. Then press ‘b’ to boot that kernel into ‘single user’ mode, which is runlevel 1 (file systems mounted but not network). This will drop you at a command prompt with access to inspect and modify your file system to either free up space or iNodes, which ever you need to do.

BSODs with Crucial M4 SSD

If you are getting frequent random blue screens with error KERNEL_DATA_INPAGE_ERROR 0x7A and you have a Crucial M4 SSD, you might want to update the firmware for the drive. Crucial have acknowledged the issue and released a firmware update which fixes it. After applying the update, my machine has been running for 6 or 7 hours whereas previously it could not get past 1 hour before crashing… so far so good. Get the update from Crucial here.

HDDs you shouldn’t buy

My server had 7x2Tb WD Green (EARS/EARX) HDDs. I selected them because they were cheap and supposedly power saving. However this was a mistake, as I have found out the hard way, now that 4+ have died in the past year. Apparently these drives have a firmware ‘feature’ that parks the heads on the drives after 8 seconds of inactivity. EIGHT SECONDS. Which means these drives are certainly NOT suited to a NAS / storage pool setup or OS partition since wear on the disk will be large. They are really only useful as backup drives that don’t get used much. Supposedly you can use wdidle3.exe from Western Digital to increase or disable this 8 second timeout, but I haven’t tried it yet. Instead I’ve been replacing failing drives with Hitachi Deskstar drives, which from my recent research, may last 3+ years as opposed to 1.5 years max lifespan for WD green drives. I also bought 3x2Tb Seagate Barracuda HDDs as backup drives, and 1 of those 3 failed within the first year.

So, here’s my advice: If you want your drive to last more than 1.5 years before failing, don’t buy Seagate Barracuda or WD green drives.

UPDATE: I just noticed that the WD-EAR drives are running hotter than Hitachi, a good indicator for Hitachi so far:

wd-hot

Streaming media from Samba share to Android

Like many, I have a server/NAS that contains all my media (movies, music, photos, documents etc.), which is accessible via a Samba share to PCs in my house via LAN/wifi. XMBC works great to view all the media on-demand on the TV, however I was looking for a way to stream the media to an Android tablet (Samsung Galaxy Tab 10.1). And I found a REALLY simple, free solution:

  1. Install ES File Explorer (free) from the Android Market.
  2. Install RockPlayer Lite (free) from the Android Market.
  3. Open ES File Explorer and change view to LAN, then hit ‘New’ then ‘Server’ (from menu).
  4. Enter your server/NAS IP address, then a forward slash, then your share name, and any credentials if necessary.
  5. Hit OK, now you should be able to browse all the files on your Samba share with ease.

Now, when you tap on a movie file (avi/mpg/wmv etc.), you will get a prompt to choose which video player to use. Select RockPlayer Lite, and your movie will start playing within a few seconds. I’ve tested this for a variety of movie file types and sizes and RockPlayer Lite works really well, including skipping forwards and backwards through movies. For HD movies, expect to wait a little longer for the movie to buffer before it begins to play (e.g. > 10 seconds), once it gets going it seems pretty stable (no buffering). I’ve also tested this approach on an Android smartphone (LG P500 Optimus One, as well as the Samsung Galaxy S, both running Android 2.2) and it works just the same on the phone as the tablet). Awesome! And of course, you can also use this approach to play music and view photos from your server anywhere in range of your wifi signal 🙂

Can iPad do that!? :p

Fastest way to delete a directory tree in Windows

Our nightly build was taking > 30 minutes to delete our source tree before getting latest (Windows is not quick to delete a large number of small files), which was comparable to the time taken to actually compile the source. There are stackoverflow and superuser articles that discuss the fastest way to do deletes, however there’s no magic solution. If you find you are regularly needing to delete large amounts of files (e.g. as part of the nightly build process), your best option is to create a new partition on your HDD (or install a new HDD) and store all the files you will be deleting on there. Then you can erase them all in about 2 seconds by doing a quick format of the drive! The only trick is how to programatically format a drive, since there is no (documented) API for this in Windows. There’s Win32_Volume.Format (WMI) but it’s only available in server OSs, and then there’s SHFormatDrive, which shows a dialog.

However, provided your partition (in this case Z:) has no label, you can do this:

echo Y | format Z: /FS:NTFS /X /Q
An inferior alternative to this is to use a VBScript that uses SendKeys like this:

set WshShell = CreateObject("WScript.Shell")
wshShell.run "c:windowssystem32format.com Z: /FS:NTFS /V:QuickWipeDrive /X /Q"
wscript.sleep 1000
wshshell.sendkeys "QuickWipeDrive" & "{ENTER}"
wscript.sleep 1000
wshshell.sendkeys "Y" & "{ENTER}"
wscript.sleep 5000

The downside of this script is that it will only work when the computer is not locked since SendKeys requires a console session that is logged in an active.