curator: a file server

Table of Contents

Intro

This machine was constructed in early 2014.

Goals:

  • Runs Linux.
  • Uses SnapRAID for backups.
  • Headless: no keyboard or monitor required
  • 8 SATA devices: 1 parity and 7 data (LATER: 6 data and 2 parity)
  • No optical drives.
  • Discs spin down when not in use.
  • Easy disc replacement; a hot-pluggable backplane would be convenient, although I don't have to have the machine up while swapping discs.
  • Will be a file source for XBMC.
  • Quiet.
  • Low power.

Non-goals:

  • Bleeding edge.
  • Fast CPU
  • Transcoding or any sort of heavy processing.
  • Saving money compared to a prebuilt system.
  • Cool appearance
  • PCI expansion slots.

Hardware

Motherboard

GIGABYTE GA-F2A88XM-D3H

  • FM2+ / FM2 AMD A88X (Bolton D4) HDMI SATA 6Gb/s USB 3.0 Micro ATX AMD Motherboard
  • from: NewEgg
  • price: $77.99 + $5.67 shipping

I wanted a board with 8 SATA ports. AMD board + CPU combinations were about half the cost of Intel.

The newer FM2 socket boards are cheaper than the older AM3 boards.

This is my first AMD build, and my first Gigabyte board.

I like the micro-ATX size because it leaves more room in the case and allows for better cooling.

CPU

AMD A4-5300

  • Trinity 3.4GHz (3.6GHz Turbo) Socket FM2 65W Dual-Core Desktop APU (CPU + GPU) with DirectX 11 Graphic AMD Radeon HD 7480D AD5300OKHJBOX
  • from: Amazon
  • price: $49.99 + free shipping

I just selected a 65W processor to fit the FM2 socket of the motherboard.

Lower power AMD processors have been due for a while, but I did not want to wait. I regret something like the Sempron 145 45W processor was not available for the motherboard at the time. It seemed like a good fileserver choice.

Memory

Crucial 4GB kit (2GBx2)

  • DDR3 PC3-12800 • CL=11 • Unbuffered • NON-ECC • DDR3-1600 • 1.5V • 256Meg x 64 • Part #: CT2KIT25664BA160B
  • from: Crucial
  • price: $49.99 + $3.50 tax, free shipping

Found by using the Crucial memory configurator: just select the motherboard.

Case

ZALMAN MS800

  • from: Amazon
  • price: $89.99 + free shipping

There are cases with internal space for 8 3.5" drives, but I could not find one with a hot-pluggable SATA backplane. Also: it would be good to have externally visible indicator lights for each drive.

An approach often used by server builders are the 5-in-3 and 4-in-3, etc, hard drive cages that fit into external 5.25" spaces.

For eight drives, 2 4-in-3 cages would require 6 external bays. I picked a well-reviewed one with a plain external appearance:

Hard Drive Cages

ICY DOCK FlexCage MB974SP-2B

  • Tray-less 4 x 3.5 Inch HDD in 3 x 5.25 Inch Bay SATA Cage - Front USB 3.0 Hub
  • from: Amazon
  • price: $119.62 each (2) + free shipping

Note that each cage is more expensive than the case itself.

The fan on one cage was pleasantly quiet, but the other noisier than anything else in the case. I replaced it with the Thermaltake fan below.

The LEDs are brilliant blue, way too bright for a dark bedroom.

Power supply

Corsair CX430M

  • CX Series 430 Watt ATX/EPS Modular 80 PLUS Bronze ATX12V/EPS12V 384 Power Supply
  • from: Amazon
  • price: $46.47

Online calculators said I needed something over 300W for this design, including 8 3.5" 7200RPM SATA discs running at once. The NewEgg calculator had 321W.

I picked a modular 430W PSU. Single-rail, as recommended in the UnRaid forums.

Hard drives

Toshiba PH3300U-1I72

  • Toshiba Desktop 7200 3.0TB 7200RPM SATA 6Gb/s NCQ 64MB Cache 3.5-Inch Internal Bare Drive PH3300U-1I72
  • from: Amazon
  • price: $120.00 + free shipping

This will be a combined system and data disk.

3TB disks are the sweet spot for TB/$ right now.

Toshiba DT01ACA300

  • Toshiba 3.5-Inch 3TB 7200 RPM SATA3/SATA 6.0 GB/s 64MB Hard Drive DT01ACA300
  • from: Amazon
  • price: $109.99 + free shipping

This will be the parity disk.

I have a collection of 1TB Seagate drives, already populated, which will be the remaining data disks. I will replace these with larger capacity drives as I need the space or they begin to fail.

LATER:

Toshiba PH3400U-1I72

  • Toshiba 4TB SATA 6Gb/s 7200rpm, 128MB Cache 3.5-Inch Internal Hard Drive (PH3400U-1I72)
  • from: Amazon
  • price: $129.99 + free shipping
  • quantity: 3

Added these for the First expansion.

Replacement hard drive cage fan

Thermaltake ISGC Fan 8

  • from: Amazon
  • price: $12.99

The ICYDock cage uses a 80x80x25mm fan, either 2- or 3-pin. Replacing it without removing the cage was not difficult.

The new fan is nicely quiet. Of the remaining noise it is hard to tell what part comes from the fans and what from the drives.

Accessories

USB sticks, for boot devices and temporary storage.

  • SanDisk Cruzer Fit 8 GB USB Flash Drive (5 Pack) SDCZ33-008G-B35-5PK
    • from: Amazon
    • price: $38.95 for 5 + free shipping

Assembly notes

Case

Both case sides come off without tools.

I removed and stored away:

  • the VGA guide and optional 92mm fan
  • 6 of the 10 5.25" drive bay covers
  • the 4 hard drive adapter trays
  • the drive bay locking knobs

Hard drive cages

The IcyDock hard drive cages are an easy fit if you slightly bend out the spring tabs on one side of the case. Else it is tight and the tabs scratch the metal sides of the cage.

Each cage is held by 4 screws on one side and 2 on the other.

If you want to place the bay in the top-most position of the case, you will need to shift the bundled cables from the top panel up just slightly. That means cutting the cable ties holding them in place.

I decided to leave empty spaces above and below each cage for better cooling. From top to bottom:

  • (empty)
  • 3 spaces containing a cage
  • (empty)
  • (empty)
  • 3 spaces containing a cage
  • (empty)

Each empty space has a bay cover.

The IcyDock docs say the SATA connectors are numbered 1 through 4 from top to bottom; the cage itself has them labeled from bottom to top.

Both cages have heavy USB 3 cables coming out the back which I'm not using, as the motherboard has only one USB 3 header and that goes to the top panel. I tucked the cables above each cage and used cable ties threaded through the case to fix them.

A tip: some drives have to be pushed into place a bit more firmly than the latching door will do it. The door can be closed and the power light will come on, but the drive still does not have a data connection. After opening the door and giving a little extract push to seat it, I have not yet had one come loose.

Power supply

The power supply fan blows out through the bottom of the case.

Two modular cables have two SATA connectors each. Since each drive cage needs two power connectors, that works out neatly.

Motherboard

The microATX board and the case match in all 8 standoff positions. I don't think it's going anywhere.

One USB 2 header is attached to the top panel, leaving one header unused.

Ethernet hardware address:

74:d4:35:01:b5:64 (assigned by the router to 192.168.0.2)

Smoke test

All well on first power on.

The TinyCore USB stick booted without having to modify the BIOS boot order.

CPU fan, the two case fans, and one hard drive cage fan all spun up. (Need a hard drive inserted into the cage for the cage fan to work).

Hardware summary results

  • Power consumption

    Idling, with one disk spinning, the system uses 40W.

    With all disks working, it runs about 100W, occasionally spiking higher.

  • Temperature

    I need to get the AMD sensors working to be sure, but it seems to run cool.

    SMART reports disk temperatures, but I don't know how accurate that is.

  • Noise

    It is a quiet box by office standards, but may be distracting in a small home theater room. Currently I have it in the basement and use powerline ethernet, which is slow, but perhaps fast enough to serve media files.

  • Appearance:

    Although "good looking" was not a goal, I think it looks nice anyway. But those lights!

Linux

My original intention was to use a Linux distribution that booted from a USB stick and ran entirely from RAM, meaning we would not need a spinning system disk. Fast, low-power, silent. See List of Linux distributions that run from RAM for candidates.

I experimented with Tiny Core Linux and Slax but became impatient with getting everything running just right and so fell back to a hard disk install of openSuSE, which has been my desktop OS for many years.

This means one disk must always be spinning for swap files and logs, etc. The installation includes vastly more software than I need for a file server. Time and interest permitting, I may investigate a RAM-based system again in the future.

openSuSE

I downloaded the 13.1 Network Install ISO.

Copied that to a USB stick with:

dd if=openSUSE-13.1-NET-x86_64.iso of=/dev/sdc   # MAKE SURE THAT IS THE USB STICK!!!

Booted the new machine from the stick.

Did a standard installation with LXDE for a desktop. I like its speed and minimalism compared to the other choices.

The installation suggested a separate partition for /home, but in retrospect I should have allocated the majority of the system disk to a top-level directory for data. I fixed that up by hand later.

After the setup and configuration period, I changed the default run level to console login. No point in have X or a windows manager running in a headless setup.

Operations

Disk layout

The system disk is a new 3TB unit, formatted with ext4. A generous Linux installation takes less than 80MB. The remaining space is in a partition mounted at:

/mnt/tera00

I have six old pre-filled 1TB discs, formatted with ext3, mounted at:

/mnt/tera01 through /mnt/tera06

A long fsck ran on each disk the first time I mounted them. I had been hot-plugging them on my desktop system and apparently fsck never runs that way.

The last SATA slot is the parity disk, a new 3TB unit formatted with ext4, mounted on:

/mnt/parity

Disk Maintenance

I run three utilities on a regular schedule with cron in the middle of the night, on different days. Results are emailed to me so I don't forget them.

SMART

Monthly.

Run smartctl --test=long on each disk. For a 3TB drive this can take 4 or 5 hours. The tests run in parallel for multiple drives and you can continue to use them during the test.

This runs in the drive's firmware and the OS knows nothing about it. You can poll the drive with smartctl -a to see if it's done. I have the cron script wait 6 hours and then capture the smartctl -a results for each drive to date-stamped output files, which I save for historical analysis.

This is not guaranteed to find disks that are about to fail. Disks sometimes go bad without a SMART warning. But if counts appear for these attributes:

  • Reallocated_Sector_Ct
  • Current_Pending_Sector
  • Offline_Uncorrectable

...then the odds of the disk failing soon are said to be high.

One of my 1TB disks shows 1 Reallocated_Sector_Ct. I'm keeping an eye on it.

e2fsck

Monthly.

This checks and repairs ext2/ext3/ext4 filesystems.

Normally this is run at boot time when a disk has not been checked after so many mounts or after a given time period. Since the file server is not going to be rebooted very often, I want to do this on a running system.

You are not supposed to run e2fsck on mounted disks, so the job will dismount each disk, run the check, and remount it.

I cannot dismount the system disk while running, so that will need some other method.

Tip: I have to un-export each volume for NFS first, else umount fails with "device busy", even though no process accessing the drive shows with lsof or fuser.

I do:

exportfs -ua

before, and:

exportfs -a

after.

SnapRAID scrub

Weekly.

For the data array: this checks the data and parity files for errors. If errors are found they are marked and can be corrected with snapraid -e fix.

By default snapraid scrub checks the oldest 12% of the array, "oldest" in the sense of "since last scrubbed", not filesystem date. Run weekly the whole array will be checked about every 2 months.

With my initial data set, the default scrub takes about 30 minutes.

Run snapraid status to get a histogram of the scrub history.

hdparm

Since a given drive is sometimes not used for a week at a time, I wanted to spin them down after an interval. I know whether to do this or not is a long-running debate, but I have not seen a authoritative opinion on one side or another. I've never used the feature before, so I'm going to try it now.

For each drive I have a line in /etc/rc.d/after.local:

/sbin/hdparm -S 180

...which will cause the disk to spin down after 15 minutes of inactivity.

You can check the status of the drive with:

/sbin/hdparm -C

...or force it to spin down with:

/sbin/hdparm -y

The system disk is always spinning.

nfs

My desktop systems are Linux, so it is natural to use NFS to mount the file server shares.

I do not currently use any pooling software to combine the separate disks into one view. I am used to managing them separately and will continue to do so for now.

On the client machines, each disk is mounted in /etc/fstab like so:

curator:/mnt/tera01 /mnt/curator/tera01 nfs defaults,bg,hard,intr 0 0

bg: background the mount so the system doesn't hang at boot time if the server is down.

samba

I do keep a read-only pool. This is mounted at /pool on the system disk and contains symbolic links to the rest of the array.

I export this with samba because nfs + links to multiple drives = headache. SMB handles it better, if you use the configuration parameters shown in the SnapRaid FAQ:

# In the global section of smb.conf
unix extensions = no

# In the share section of smb.conf
[pool]
comment = Pool
path = /pool
read only = yes
guest ok = yes
wide links = yes

SnapRAID has a pool command to create links for the entire array, but I use a script of my own invention. I wanted to customize the structure of the pool so I could rename links as necessary and provide subset views, for example when using XBMC profiles.

I'm not understanding name mangling in samba very well. To be compatible with ancient Windows requirements, some characters in a file name will cause it to be presented in uppercase 8.3 format. When creating links I edit these characters out of the name: "? : /".

On the Linux clients, the line in /etc/fstab for the pool:

//curator/pool /mnt/curator/pool cifs guest,ro,uid=wmcclain,gid=users,iocharset=utf8,mapchars 0 0

SnapRAID

SnapRAID (which is not a standard RAID solution) is a backup program that saves hash and parity information and allows correction of disk errors and recovery from bad disks.

It is targeted towards collections of large files that are not often modified, like a media file library.

You can see its many virtues on the web pages, and here is what seems like a fair comparison with similar software.

I selected SnapRAID because:

  • it covers what I need in a lean, non-obtrusive way
  • the simple command-line interface appeals to my unix biases
  • it is open source and written in C, and I could maintain it if I had to
  • it is a simple, non-privileged application and runs only while you execute its commands: no demon or kernel patches
  • you can begin with already filled disks; it is agnostic as to what file system is on the data disks
  • you are not locked in; just stop using it

things to know

The data is protected after you do the sync command. If you make changes thereafter you may risk data loss until the next sync. Recovery requires the participation of all disks in the array, so you need to be careful about what you change.

Adding new files to the array is not a problem. They are not protected until the next sync, but neither do they cause recovery problems for the existing array. (I keep a copy of new files outside of the server until they are synced).

Deleting a file changes the array and may cause problems if you need to recover from a bad disk. A safer approach is to move the file to a delete directory outside of the array until after the next sync, when it will be safe to delete it permanently. The fix command even has an option to specify files that have been moved out of the array but are still needed for recovery.

Modifying a file also changes the array and could be a risk if you need to run recovery. This is relatively rare in a media file library, but when it happens it would be handy to stage the new version in a modify directory outside of the array. Move that into place just before the sync. Maybe move the old version to the delete folder first just in case?

To do: a script or cron job to do the staged delete and modify operations with sync would be handy. Each disk would need delete and modify folders, but the whole array is handled with one sync.

SnapRAID requires one or more dedicated parity disks, each as large as the largest data disk:

  • 1 parity disk will save you from 1 disk failure (either data or parity)
  • 2 parity disks will save you from 2 disk failures (any combination of data and parity)
  • etc...

The SnapRAID FAQ has recommendations on how many parity disks you should have for a given number of data disks. For my 7 data disks I have only 1 parity disk and it is recommended I have 2.

To do: add a second 3TB parity disk and merge 2 of the 1TB data disks onto a new larger drive.

installation

This is a very quick compilation from source without special dependencies.

Untar the downloaded .tar.gz file and:

./configure
make
sudo make install

startup

I started with 6 1TB drives already populated with data, all using ext3. I added the first three to the array and synced them one at a time. Since all the disks are read when calculating parity, I did the last three together. The first sync took 2.5 hours, increasing for each additional single disk. The batch of three took about 8 hours.

Using default parameters, scrub does not operate on files less than 10 days old (in scrub-time: since last scrubbed or newly added to the array). After that, I found scrub took about an hour or less to check the default 12% of the array.

configuration

Here is my /etc/snapraid.conf configuration file, with comments removed:

parity /mnt/parity/parity

content /var/snapraid/content
content /mnt/parity/content

disk d0 /mnt/tera00/
disk d1 /mnt/tera01/
disk d2 /mnt/tera02/
disk d3 /mnt/tera03/
disk d4 /mnt/tera04/
disk d5 /mnt/tera05/
disk d6 /mnt/tera06/

exclude *.unrecoverable
exclude /lost+found/
include /backup/
include /video/
include /audiobook/

On the data disks only the top-level "backup", "video", and "audiobook" directories are part of the array. Everything else on the system is invisible to SnapRAID. So the server can be used for other types of backup that are not covered by the sync command.

typical tasks

Print a report of array status, including a histogram of scrub history:

snapraid status

List the contents of the array:

snapraid list

Show adds, changes, and deletes since the last sync:

snapraid diff

Sync the array:

snapraid sync

tweaking

Per a forum post, I did this to speed up syncs:

echo 512 > /sys/block/sda/queue/read_ahead_kb
echo 512 > /sys/block/sdb/queue/read_ahead_kb
echo 512 > /sys/block/sdc/queue/read_ahead_kb
echo 512 > /sys/block/sdd/queue/read_ahead_kb
echo 512 > /sys/block/sde/queue/read_ahead_kb
echo 512 > /sys/block/sdf/queue/read_ahead_kb
echo 512 > /sys/block/sdg/queue/read_ahead_kb
echo 512 > /sys/block/sdh/queue/read_ahead_kb

Expansions

First expansion

Goals:

  • Add another parity disc. Two are recommended for an array with this many discs. Since the machine is full up on disc slots, this meant consolidating two data discs into one.
  • Add additional storage space.
  • Replace 1TB discs tera03 and tera04, which were each showing SMART Reallocated_Sector_Ct = 1. Perhaps not a problem, but since I was replacing discs anyway, these were the ones to go.

Steps:

  • copy 3TB parity disc to new 4TB parity disc.
  • Old 3TB disc becomes new tera04. Copy all contents and retire old disc.
  • Merge 1TB tera03 onto tera04. Copy all contents and retire old disc. Net gain: 1TB storage. The array will no longer have a tera03 data disc.
  • Install new 4TB parity2 disc into old tera03 slot.
  • Replace old 1TB tera01 disc (the oldest in the array) with new 4TB disc. Copy all contents and retire disc. Net gain: 3TB.

Results:

  • Upgrade from 1 to 2 parity discs.
  • Net gain of 4TB data storage.
  • 3 1TB spare discs: old tera01, tera03, tera04.
label size  
tera00 3TB includes system and non-raid areas
tera01 4TB  
tera02 1TB  
parity2 4TB  
tera04 3TB old tera03 + old tera04
tera05 1TB  
tera06 1TB  
parity 4TB  

Second expansion

Goals:

  • Convert from SuSE to Arch Linux to match the rest of the household.
  • Use an SSD as the system disc. Try keeping tera00 as a normal data disc. Since all 8 SATA ports are in use, see if we can boot and run from a USB 3.0 port.
  • The SSD will be the "always on" device, so it should be sized to allow non-array backups and large temporary files in the staging area.
  • All discs except for the SSD will normally be spun down.
  • Update software and reboot once a week, just before the weekly snapraid job, so as to use the same spin-up cycle.
  • Replace the 3 remaining 1TB drives with 4TB volumes. Use proper partitioning this time.
  • Establish daily incremental backups allowing rollback to previous dates.

Steps:

  • Upgrade the 1TB drives to 4TB:

    • The drives already had GPT partitioning, but no partitions. Create one:

      # parted /dev/sdb

      (parted) mkpart primary ext4 513MiB 100%

      (parted) align-check

      (parted) quit

    • Create filesystem and label:

      # mkfs.ext4 /dev/sdb1 # takes a while!

      # e2label /dev/sdb1 new02

    • Mount the new drive on the server, using a powered USB popup-dock so I don't have to swap out any other disc:

      # mount /dev/disk/by-label/new02 /mnt/temp

    • Copy everything, retaining all attributes and sub-second timestamps:

      # cd /mnt/tera02; cp -va . /mnt/temp

    • Swap the new disk for the old and update the label:

      # umount /mnt/tera02

      # umount /mnt/temp

      (remove the old disc and insert the new in its slot)

      # e2label /dev/disk/by-label/new02 tera02

      (edit /etc/fstab and make sure filesystem is ext4)

      # make the disc spin down

      # /sbin/hdparm -S 180 /dev/disk/by-label/tera02

      # mount -a

      (verify the disc is online, contents look ok, and ownership under /mnt looks the same as the others)

    • See if snapraid is happy:

      snapraid diff # will show disc uuid change

      snapraid check -a -d d2 # takes 90 minutes for 1TB

      snapraid sync # runs fast

    • Repeat the above for tera05 and tera06.

Results:

  • Net gain of 9TB data storage.
  • 3 1TB spare discs: old tera02, tera05, tera06.
label size  
tera00 3TB includes system and non-raid areas
tera01 4TB  
tera02 4TB  
parity2 4TB  
tera04 3TB  
tera05 4TB  
tera06 4TB  
parity 4TB  

Issues

History


This document was generated on December 08, 2016 at 13:36 CST with docutils.