Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
Here's my little contribution to this thread:

Build-Your-Own Portable RAID Appliance!

While NAS is great, even over gigabit Ethernet you're never going to get SATA/eSATA speeds from a NAS box. One of the nicest things about NAS right now is the wide variety of portable offerings available, but why doesn't direct-attached RAID get the same portability love? Well, I wondered that too, and after enough digging, I decided to roll my own. What follows is the buildout I did for a friend that was looking for such a portable RAID solution.

First, the advantages of this approach are considerable:


* DAS blows NAS out of the water for raw speed, any day of the week.

* No computers to configure! This isn't a shuttle PC running linux; the RAID system is completely self-contained. If you don't like fiddling with software RAID implementations, this might just be the way to go. Once you've configured your RAID set, it's just plug-and-play simplicity.

* Separating the network function from the storage function lets you format your RAID set as you require. For me, I maintain a few classic Macs, which requires that I use HFS+ formatted disks if I want to store any old Mac files with resource forks. Very few NAS appliances out there will let you do this, but with a RAID appliance hooked up to my trusty PowerMac / OS X combo, I can have an HFS+ disk set being shared over AFP and SMB simultaneously. Total mixed-environment compatibility = win!

* You don't even have to give up network attachment! As I pointed out above, you can connect your RAID appliance to any networked computer and share to your heart's content. The best part about this is flexibility -- you aren't locked into the network filesystem protocols built into your NAS box.

* You don't even need a computer to add NAS capability! While my particular scenario suits hooking the RAID up to a computer to be shared, there are several bridgeboards available that bring NAS capability to DAS devices as well.



So that's all well and good, but do keep in mind some of the drawbacks and limitations of this approach, as well. Some of these, like the inherent rules and limitations of RAID, apply to both DAS and NAS boxes that offer those RAID levels; teamdest has provided a competent breakdown of RAID in the OP.


* Remember that unlike some "smarter" storage devices, you cannot grow the array size simply by swapping disks in the same set for larger ones down the road. While the RAID hardware considered for building this box is quite capable, even it does not offer this capability. Though that could theoretically change with later firmware, don't quote me on it, and certainly don't embark on building something like this expecting the capability to be added down the line.

* This thing isn't cheap. Entry-level NAS boxes can be had for around $300-$400; competent SOHO/prosumer ones, like the Infrant/Netgear ReadyNAS NV+, are about $800-$850 at the time of this writing. This RAID costs a bit more than even that. Whether or not the benefits of portable DAS RAID are worth it to you should be carefully considered.

Anyway, suffice it to say that I think the portable RAID appliance is a winning proposition. So without further ado, here's how to build this identical-but-overpriced $1600 box:



...for less than 60% of the price. Separately buying the same parts and putting them together yourself comes out to around $940 (shipping not included) at the time of this writing.


Breaking It Down

Believe it or not, the RAID system pictured above can be exactly replicated, part-for-part. It's just a matter of knowing what it's made of, and finding the individual components for the right price. Here is what that sexy portable RAID consists of:


* Areca Technologies ARC-5030 RAID Subsystem -- Lowest price: $662.35 (Froogle, 18 Mar 2008)



This device is GREAT, and comprises the heart of our little system. Areca is one of the big players in hardware RAID (the other at this price point being 3Ware), and the feature set on this subsystem (and the price) proves it. Check the website for full details, but in summary this is a 5-drivebay enclosure on a Marvell SATA II backplane, with dual SATA and IDE outputs for host connection. The system supports RAID levels 0,1,3,5,6 and JBOD and has a dedicated ASIC for parity calculations. Too many RAID features to list here.

Note that if SATA isn't your thing, Areca offers other similar enclosures for SCSI drives too, but don't expect that to be as cost-effective. Other enclosures are outside the scope of this writeup.


* Areca Technologies ARC-1000 Control Panel -- Lowest price: $79.00 (Froogle, 18 Mar 2008)



The control panel for this system is actually an optional purchase, as the RAID subsystem itself supports management over the network. However, its geek value is high. If cost is a concern, skip this and save a few bucks.


* Addonics Technologies Storage Tower - Base Model -- Lowest price: $119 (Addonics, 18 Mar 2008)



You have no idea how hard it actually was to google for this enclosure type, but here it is. This enclosure will support up to 4 5.25" devices, and comes with a ~200W PSU integrated. In this case the Areca subsystem takes up 3U, and the control panel takes up the last bay. The subsystem is a tight fit in the box, but it works. I recommend going with the base model, and adding the appropriate back panel and/or cables separately.

Specifically, the "Port-Multiplier" back panel will give you space to mount two bridgeboards, and a punchout for a SATA to eSATA connector. This is probably the most practical configuration for most people.


This is the other part where personal choice will affect the overall price of this little RAID applicance, as this is the part where we choose the bridgeboard that is best-suited to you. Here's what I ended up going with for the buildout.

* FireWire 800 / USB2.0 bridgeboard (Dual SATA) -- Price: $69 (DatOptic, 18 Mar 2008)



This bridgeboard contains dual SATA links and provides a bridge to FireWire 800 (backwards-compatible with FW400) and USB 2.0. The controller is an Oxford 924 chipset, which I highly recommend. There are competing bridgeboards out there based on the Initio chipset, but given my past experiences with them, I'd say avoid them at all costs.

The nice thing about this bridgeboard is that it appears to offer a SATA mode that will simply make the two ports nodal -- that is to say, you can take the SATA out from the Areca controller, connect it to the bridgeboard, and then take a second SATA to eSATA cable and hook it to the remaining port. This lets you avoid having to use the IDE output on the controller for your bridgeboard. Disclaimer: I have not tested this feature, and can't guarantee that this bridgeboard will function the way I've described this setup, though the documentation suggests it should. Don't blame me if the implementation I described doesn't actually work. If you want to play it safe, either don't use eSATA in conjunction with a SATA bridgeboard, or get an IDE bridgeboard to handle the FireWire/USB function.

Other options for boards here include NAS bridgeboards (also available on the DatOptic site).



Whew! Outside of assembly, which should be pretty straightforward, you now have everything you need to build one of these boxes! Like I said, the cost for these components comes to roughly $940. If you want to save money, you can remove the control panel, and even the bridgeboard if you want to run eSATA only; dropping those two items will save you almost $150 more.

I'll update this post as needed, including any clarifying remarks you might have for me. Thanks to teamdest for taking on this megathread!

McRib Sandwich fucked around with this message at 04:11 on Mar 19, 2008

Adbot
ADBOT LOVES YOU

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

stephenm00 posted:

why isn't zfs and raid-z a more common option? Their must be some disadvantage for home users right?

Well for starters, in the case of RAID-Z you're basically required to run an operating system that can support it. The strengths of RAID-Z come from the fact that it is so intimate with the OS.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

teamdest posted:

That is seriously amazing! I'm actually somewhat amused that they just used off-the-shelf parts and are packaging it at such a premium, I'm betting most people don't even consider that you could put something like that together on your own.

:D I was amazed as you are. That was a buildout I did for a friend, and I was impressed with how well it came together, given that I didn't get any hands-on time with any of the components before they all arrived. I just did a shitload of research, and it paid off!

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

cycleback posted:

I have been thinking about buying one of these Addonics storage towers to use as a DAS and a portable offsite backup.

Can you comment on the quality of the enclosure and the power supply?

How noisy is the power supply?

Is there enough airflow over the drives to keep them cool? I am concerned that there might be airflow problems because of the mesh looking sides.

Is it possible to change the backpanel once it is purchased?

Well, in a lot of respects the enclosure feels like a shuttle case; the construction is pretty sturdy (enough to lug a 5-disk RAID around in, anyway). The PSU I really can't vouch for unfortunately, it seemed to be of average quality. I think I remember the PSU itself containing two fans, one of which was noisier than the other. We ended up unplugging one of them to decrease the noise somewhat. This wasn't a problem in our case because the Areca subsystem has a dedicated fan to itself. I don't recall the PSU itself being terribly noisy on its own, though... the RAID was substantially louder.

If you put your own drives in there, as opposed to a dedicated disk enclosure unit, you may want to add some active cooling. As I remember, the bare box is just that; the only fans I recall coming with it were attached to the PSU. The website pictures are actually pretty good at giving you an idea of what the bare case looks like.

As far as the backpanel, you can order any of a few different ones from Addonics. It's literally just a plate that you screw into place, with prepunched holes for whatever config you ordered from them. If you have the right metalworking tools, you could make your own custom panel pretty easily.

Sorry for the underwhelming details, for all the research I did on this thing I didn't have it in my own hands for very long, as it was for someone else. Hope that helps. All in all, I couldn't find anything similar on the market, and it's hard to beat the modularity and portability of the thing.

McRib Sandwich fucked around with this message at 07:14 on Mar 19, 2008

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
It's been awhile since I've checked in on this thread, so I spent the better part of the day catching up on the last 50 or so pages. drat.

I noticed that there was a lot of talk of Nexenta / OpenIndiana / Illumos back around the page 40-50 area, but almost nothing at all recently. Anyone have any modern opinions on the Nexenta Core Platform and/or NexentaStor? I've started playing around with NexentaStor CE in a VM and it looks really great so far, my only misgiving with it is the lack of AFP support out of the box. I don't know if their OS strategy will play out (looking for opinions here), but marrying Ubuntu LTS userland with the OpenSolaris kernel seems like a fairly practical way to go. I have not been following what Oracle's done with the platform -- is any distro based on Solaris (even partially so) going to be viable in more than a couple years? Is FreeBSD + ZFS sufficiently mature now?

On the other side, I know there were also misgivings about the OpenIndiana team getting reliable / timely releases out the door, but that was also the better part of a year ago. Nexenta's gone a different route and the product itself seems pretty polished, but at this point I haven't heard any opinions on the ZFS OS landscape that aren't a year old, and would love to hear how people feel about the way things have played out with Oracle.

If the NexentaStor CE demo works out well, I'm really tempted to set a pair of boxes up. Very curious to hear more about the ZFS landscape before I do this, though. Give me your opinions!

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

FISHMANPET posted:

After Oracle gave OpenSolaris the boot, Illumos took the kernel code and OpenIndiana took the distro code, Neither have done much of anything. OpenIndiana released a binary of the b148 code base, meanwhile Oracle released Solaris 11 Express, based on b151.

Nexenta has switched to Illumos, but with no development on that front, it's only a matter of time before Nexenta dies unless it starts pouring money into Illumos (but can it really compete with Oracle or the whole Linux kernel?). Meanwhile ZFS is getting ported to Linux, Oracle isn't smart enough to merge BTRFS and ZFS, and Linux is catching up with Solaris in general. Oracle is jerking the chain of their customers, with no long term roadmap (at least not in the storage arena, and I doubt anywhere else either).

Well god dammit (and gently caress you, Oracle). No wonder folks have been quiet about it lately. So is there a "preferred" platform for ZFS nerds these days, then, given the huge uncertainty surrounding the Solaris ecosystem? For the most part, Nexenta seems to Just Work™, and it actually looks easy enough to administrate that I wouldn't feel apprehensive handing the box off to the next guy (our volunteer staff rotates frequently). If I knew that I could at least take the raw drives out of a NexentaStor install over to another platform that supported ZFS, that would ease my fears somewhat.

So, to ask an unfair question, is it even worth seriously considering NexentaStor at this point, or are they just going to become Oracle collateral damage within a few months / years? If I dive in anyway and they go under, what are the chances of pulling my ZFS drives out of the NexentaStor box and having them import cleanly to a FreeBSD / Linux distro running ZFS?

Storage :argh:

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
This question sort of straddles the Enterprise storage thread and this one, but this seems the slightly more appropriate place to ask given the continuing discussion of ZFS.

Does anyone have experience or recommendations for external self-contained disk shelves that can present disks as JBOD to an OpenSolaris/Indiana or Nexenta installation? The Nexenta HCL for hardware is thin as hell -- I've never heard of their storage hardware partners outside of Supermicro, and Supermicro seem to only offer integrated disk shelf + motherboard integrated units, which is not the route I'd like to go.

Dell's PowerVault line doesn't even seem to mention the word JBOD anywhere anymore, LSI is out of the disk shelf business... so what are SMB folks using to hook large disksets into ZFS boxes nowadays?

edit: Any discussion of physical interface (SAS, iSCSI over Ethernet, etc.) would be helpful too.

McRib Sandwich fucked around with this message at 22:46 on Jul 11, 2011

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

FISHMANPET posted:

We're going through this at work right now, and someone found that HP makes a SAS JBOD, though I haven't verified it myself.

In a pinch the PowerVaults can do it, you just have to use the onboard controller to turn each disk into a single disk array, but that's a big kludgy hack and I wouldn't recomend it except as a last resort.

As for HCL, it doesn't matter, as long as you plug it into a supported SAS card, which there are many of.

I guess for my needs, the kicker is that I'd like to find a JBOD that the NexentaStor hardware knows about, so that it can flash an indicator light when a disk on the unit fails. Like I said though, their HCL is extremely thin.

Of course it occurred to me as soon as I hit submit on that last post, but it looks like the Promise VTrak line is another well-known vendor (though I know little of them in the way of reliability or reputation). Wonder if anyone here has experience with throwing a VTrak at a ZFS installation? Nexenta specifically, maybe?

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

Gwaihir posted:

I've set up a Nexenta box on a Dell R710 with a Perc5/e card running a Powervault MD1000 without issue. Dell's controllers are indeed wierd like that (I guess it's an LSI thing?) but I was able to pool up all the drives across in internal Perc6/i and the Perc5/e external SAS controller no problem. Like Fishmanpet posted, you just have to make individual "Raid-0" arrays out of each disk, which is how those controllers end up doing JBOD mode. I also tested it with Vertex2 SSDs for l2ARC, which also worked fine. The rest of my drivers are 300 gig 15k rpm SAS, internally and in the MD1000. I don't have that machine up at the moment (It's my generalized "random poo poo" test box, so it gets reformatted pretty frequently), but I could probably get it back up and running to bench from the MD1000.

Holy crap, this is almost exactly the setup I'm trying to get off the ground -- don't go anywhere! Seriously though, there are apparently remarkably few people running Nexenta on server hardware like this, as opposed to ZFS directly on OS/OI. I'd love to hear as much as you can tell us about what you love and hate about the platform.

Specifically, one of the touted features of Nexenta is its ability to map disks to physical slots so that Nexenta can flash an LED on a failed drive when it goes bad -- this is crucial for my intended deployment. Have you had any luck getting this mapping to work on the MD1000 (especially with the RAID-0 kludge)?

Are you worried about drive portability in case your MD1000 fails? With the RAID-0 workaround, I could see that potentially being an issue if you needed to move the physical disks to another machine for recovery purposes.

Are you doing anything like presenting the Nexenta shares to VMware? Impressions of iSCSI / NFS support in Nexenta?

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

Gwaihir posted:

Resurrecting that box is next on my list of things to do (It's currently testing our server2008 image and Hyper-V setup), because I do intend to use it to run VMs off of. Unfortunatly (for you at least), for now they'll be Hyper-V based VMs, since I'm stuck with using whatever comes down to us in the "Official" image- I work for a state level SSA office, so we're somewhat constrained in what stuff we can try and pull with respect to our server infrastructure and such. On the other hand, I'm not worried about drive portability at all- We're using uniform hardware across all of our windows servers, so it's always going to be MD1000s with the same controllers. They can swap drives around between them and recover the config that the drive was using without much issue. I thiiiink (Not certain about this, but I'm going to add it to my things to test now) that in single drive raid-0 mode, the controller really is just acting as a JBOD hba- I should be able to just pop a single drive in to another machine without a raid controller at all and read it.

I'll have to let you know about the drive notification issue when I get the box back up and running. Last time I was testing with it, I was mainly checking to see how good I was able to get AD integration and authentication working, for serving shares to users on windows boxes (It worked fine after a few hiccups- Case sensitivity aparently DOES matter when putting in things like domain controller names or LDAP strings in this case!). It also has super easy to configure email alerting about all sorts of conditions relating to server and drive health, however I did not get to checking the actual drive LED notifications. I'm 95% certain that the drivers were listed off in their enclosure connection order, due to the way they were passed to ZFS by the raid controller, but I'd have to check for sure.

Aside from the bone headed AD thing though, the entire platform worked really really well for the time I was messing with it. In terms of setup, pool config, nic teaming, etc, it's a nice tool to deal with. I'll have an update about how it works as a VM parking place for the hyper-v cluster I'm working on at some point down the road.

Wow, this is great, thanks for the details. Definitely looking forward to what your findings are as you resurrect this machine!

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

jeeves posted:

I've been using shares via SMB to my girlfriend's mac to use as a ghetto HTPC for years now, it's only recently I have been trying to make a NAS to consolidate media. But yeah, the knee-jerk responce from someone on FreeNAS's own forum was 'oh its apple's fault with their unreleased untested mac 10.7!"

So I am going to go home and see if her mac can gently caress with my Win2003 box and I haven't noticed in the 2 weeks I've had 10.7 installed on her machine (this would be easy to recognize-- I can just see if her Mac poo poo up my server with tons of hidden mac .DS_Store files), or if the FreeNAS people are full of poo poo.

I'm guessing the latter. I'm pretty close to just going with Win2008 instead of this poo poo, but I keep banging my head against this wall as really like the benefits of ZFS over Raid-5 (and not having to drop more money on another HD to install Win2008 to).

Edit - Remoted in, and yeah my Win2003 box doesn't have any of the .DS_Store files that mac machines poo poo up any network space they have write access to (my Temp share that the mac DOES have write access to has this ever present hidden file-- last modified earlier today when I tested last, so it is not like they ever fixed that with 10.7 anyhow). This is definitely a problem with FreeNAS not properly asking for authentication from her Mac, and not just a blanket loving bug with Mac 10.7 being Neo (laff) on all SMB shares.

In summation: ugh gently caress FreeNAS

Been following your issues here, just wanted to mention a couple things:

The .DS_Store presents that OS X leaves on remote shares can actually be disabled in the OS. No idea why this isn't the default for non-AFP volumes, but hey, Apple. Here's the article on disabling that:

http://support.apple.com/kb/HT1629

Also, if you want to run ZFS but don't want to put up with the bullshit that you're getting from FreeNAS, you may want to consider the NexentaStor (or Nexenta Core Platform) product that I've been trying to gather info on. The NexentaStor Community Edition is free and allows stores of up to 18TB raw disk, comes with a web GUI, and is pretty darn nice for what it does.

http://www.nexentastor.org/projects/site/wiki/CommunityEdition

If you do end up checking it out, share your experiences here, it'd be nice to hear some more folks talking about the product (last mentions before recently were dozens of pages ago in this thread).

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

Galler posted:

:effort: post Incoming!

ITT I build an 8TB NAS with drives & UPS for ~$750.

Just wanted to say that this is awesome, and the price was low enough that I was compelled to make the jump.

I am running into a snag with getting this server to boot from USB though, I've had literally zero luck with it so far. In my case I'm trying to make a NexentaStor ISO into a bootable USB drive. I don't suppose anyone else has tried to make NexentaStor image bootable from USB storage? If I can't get the server to boot / install from USB, then I am going to be down a drive slot, which is rather frustrating.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
Well, this is roundly disappointing. Finally got NexentaStor running on my ProLiant (broke down and used a SATA CD drive cause I was tired of messing with USB sticks). For some weird reason, even in a RAID 0-equivalent configuration across 4x500GB drives, I can't seem to coax more than about 30MB/s out of the system. This is over CIFS, gigabit, from the server to a Mac OS X client. Maxed out the server with the 8GB ECC RAM that Galler linked to in his howto, so I don't think memory is a constraint here.

Something I noticed during the file transfers is that the disks are rarely getting hit -- the HDD activity light comes on only in spurts. Some of this is expected due to the way ZFS works with the disks, but it makes me wonder where the bottleneck is occurring here. It didn't matter if I configured my disks a RAID-Z, RAID-Z2 or RAID 0 equivalent, I was always stuck between 20-30MB/s read or write. I know almost no one else here is running NexentaStor, but maybe you've got an idea about what might be going wrong here? This has been a pretty frustrating foray so far, I had comparable Nexenta performance on my VM, with 4 virtual disks being read from a laptop drive...

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

Galler posted:

I didn't test performance at all with nexenta but I would think it would be similar to solaris. The only real difference in our setups should be fairly minor differences in software (I think).

How big are the files you're transferring? Mine slows down a lot when moving a ton of little files and starts hauling rear end when moving fewer large files.

The 500gb of assorted files I sent to mine averaged around 50mb/s write and anything I pull off it maxes out my gigabit ethernet. It was very slow then it started transferring chat logs (shittons of them and many only a few kb each) but as soon as those were out of the way and it got much faster.

I was running the AJA System Test utility, which appears to create a monolithic filesize of your choosing (I tested with 128MB and 512MB). No idea if it's moving small "subfiles" during the test or not. I used Finder copy to drag over some large TV episodes to the drive and that was less conclusive. There seemed to be some zippy periods and some sags during those transfers. I don't know how high-quality the OS X SMB/CIFS implementation is, but even at that, 20MB/s sounds slow to me. Guess I'll have to do some more digging.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

teamdest posted:

Hard drive benchmarking is actually kind of a complicated thing to do, and there's a lot of factors involved. Sending a file from your laptop to your file server is really a pretty poor way to go about it, as anything from the network stack at either end to the files themselves can cause huge variances in performances.

Edit: if you don't know the specifics of whatever system you're using, your test is kind of worthless, since there's no way to know the difference between you setting the test up wrong, running it wrong, or outright running the wrong test, versus an actual problem with the setup you're describing.

Edit 2: First test, get on the system directly at a console and test the direct write speed with something like dd if=/dev/zero of=/filename.blah and see what you get. Remove the issue of CIFS, Gigabit, OS X and just see if the filesystem is vdev and filesystem are up to speed.

I agree. Console commands are my next step, but the Nexenta Management Console seems pretty (intentionally) limited in scope. I need to find a way to drop into a privileged CLI in this thing, but I'm still learning the ropes.

On the plus side, I did get iSCSI working, the hope being that connecting with a block-level device presentation will also be a little closer to "real" than network filesystem abstractions. I'll definitely report back when I have more info in hand, appreciate the feedback.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

McRib Sandwich posted:

I agree. Console commands are my next step, but the Nexenta Management Console seems pretty (intentionally) limited in scope. I need to find a way to drop into a privileged CLI in this thing, but I'm still learning the ropes.

On the plus side, I did get iSCSI working, the hope being that connecting with a block-level device presentation will also be a little closer to "real" than network filesystem abstractions. I'll definitely report back when I have more info in hand, appreciate the feedback.

A bit more info. Ran this on the ProLiant again, on the Nexenta command line against a 4x500GB drive zpool, configured RAID 0 equivalent. Compression on, synchronous requests written to stable storage was enabled. Results:

code:
$ dd if=/dev/zero of=/volumes/tank1/file.out count=1M

1048576+0 records in
1048576+0 records out
536870912 bytes (537MB) copied, 15.8415 seconds, 33.9MB/s
compared against a rough test of raw CPU throughput:

code:
$ dd if=/dev/zero of=/dev/null count=1M

1048576+0 records in
1048576+0 records out
536870912 bytes (537MB) copied, 2.50262 seconds, 215MB/s
34MB/s on the native filesystem still seems really slow to me. These are old drives, I can't image they need to be realigned like the 4k sector drives do. Any thoughts?

edit: The 34MB/s speeds are also in line with my timed tests in copying over large files to a iSCSI-mounted zvol that I created on top of the same pool.

McRib Sandwich fucked around with this message at 00:13 on Aug 8, 2011

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

teamdest posted:

Excellent, so it seems that the issue is somewhere in ZFS or in the drives themselves. are they already built and have data on them, or could you break the array to test the drives individually?

Actually, I ended up doing that last night after posting that update. I found that running the same dd command on a pool comprised of a single drive delivered about the same write speed -- 34MB/s. That said, I would've expected increased performance from a RAID 10 zpool of those four drives, but I didn't see any increase in write performance in that configuration.

Anyway, I have free reign over these drives and can break them out as needed. What other tests should I run against them?

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

teamdest posted:

Well I would expect your write speed and read speed to pick up on a striped array, that's kind of strange. A Mirror, not so much, since it has to write the same data twice. Can you try it locally on a striped array instead of a mirrored stripe? Just trying to eliminate variables. And could I see the settings of the pools you're making? something like `zfs get all <poolname>` should output them, though I don't know if that's the exact syntax.

Sure thing, output is below. I sliced up the pool a few ways this evening, including RAID 10 again, RAID-Z2, and RAID 0. Always hit a ceiling of about 34-35MB/s write using dd, without any other I/O hitting the disks in the pool. Enabling or disabling compression didn't seem to make much difference, either.

I pulled a couple samsung drives out of the unit and replaced with a third WD to see if vendor commonality would make a difference, doesn't seem to so far. Here's the output from that 3-disk WD array in RAID 0.

code:

root@nexenta:/export/home/admin# zpool status
  pool: syspool
 state: ONLINE
 scan: scrub repaired 0 in 0h2m with 0 errors on Mon Aug  8 23:36:25 2011
config:

        NAME        STATE     READ WRITE CKSUM
        syspool     ONLINE       0     0     0
          c0t5d0s0  ONLINE       0     0     0

errors: No known data errors

  pool: tank1
 state: ONLINE
 scan: none requested
config:

        NAME        STATE     READ WRITE CKSUM
        tank1       ONLINE       0     0     0
          c0t0d0    ONLINE       0     0     0
          c0t2d0    ONLINE       0     0     0
          c0t3d0    ONLINE       0     0     0

errors: No known data errors

root@nexenta:/export/home/admin# zfs get all tank1

NAME   PROPERTY              VALUE                  SOURCE
tank1  type                  filesystem             -
tank1  creation              Mon Aug  8 23:32 2011  -
tank1  used                  512M                   -
tank1  available             1.34T                  -
tank1  referenced            512M                   -
tank1  compressratio         1.00x                  -
tank1  mounted               yes                    -
tank1  quota                 none                   default
tank1  reservation           none                   default
tank1  recordsize            128K                   default
tank1  mountpoint            /volumes/tank1         local
tank1  sharenfs              off                    default
tank1  checksum              on                     default
tank1  compression           off                    local
tank1  atime                 on                     default
tank1  devices               on                     default
tank1  exec                  on                     default
tank1  setuid                on                     default
tank1  readonly              off                    default
tank1  zoned                 off                    default
tank1  snapdir               hidden                 default
tank1  aclmode               discard                default
tank1  aclinherit            restricted             default
tank1  canmount              on                     default
tank1  xattr                 on                     default
tank1  copies                1                      default
tank1  version               5                      -
tank1  utf8only              off                    -
tank1  normalization         none                   -
tank1  casesensitivity       sensitive              -
tank1  vscan                 off                    default
tank1  nbmand                off                    default
tank1  sharesmb              off                    default
tank1  refquota              none                   default
tank1  refreservation        none                   default
tank1  primarycache          all                    default
tank1  secondarycache        all                    default
tank1  usedbysnapshots       0                      -
tank1  usedbydataset         512M                   -
tank1  usedbychildren        76.5K                  -
tank1  usedbyrefreservation  0                      -
tank1  logbias               latency                default
tank1  dedup                 off                    default
tank1  mlslabel              none                   default
tank1  sync                  standard               default

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=1M

1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 15.5217 seconds, 34.6 MB/s

Running pfstat while commiting a huge amount of /dev/zero to disk using dd, I was able to get the load average as high as 1.25 over a few minutes time:

code:

   PID USERNAME  SIZE   RSS STATE  PRI NICE      TIME  CPU PROCESS/NLWP       
  3386 root     3620K 1636K cpu0    10    0   0:00:32  39% dd/1
  2802 root        0K    0K sleep   99  -20   0:00:12 1.7% zpool-tank1/138
  1552 root       22M   20M sleep   59    0   0:00:16 0.2% hosts-check/1
   830 root       42M   40M sleep   59    0   0:00:25 0.2% python2.5/55
   813 root       67M   29M sleep   59    0   0:00:22 0.1% nms/1
  3008 root       22M   20M sleep   44    5   0:00:04 0.1% volume-check/1
  2431 admin    4304K 2952K cpu1    59    0   0:00:00 0.0% prstat/1
   409 root     3620K 2132K sleep   59    0   0:00:03 0.0% dbus-daemon/1
     5 root        0K    0K sleep   99  -20   0:00:19 0.0% zpool-syspool/138
   559 root       35M 4580K sleep   59    0   0:00:00 0.0% nmdtrace/1
  1700 root     6064K 4368K sleep   59    0   0:00:00 0.0% nfsstat.pl/1
  1595 root       22M   20M sleep   59    0   0:00:03 0.0% ses-check/1
   551 www-data   17M 7512K sleep   59    0   0:00:00 0.0% apache2/28
    11 root       12M   11M sleep   59    0   0:00:14 0.0% svc.configd/14
   441 root       11M 7000K sleep   59    0   0:00:07 0.0% smbd/15
   540 root       23M   14M sleep   59    0   0:00:01 0.0% fmd/22
   357 root     2600K 1576K sleep  100    -   0:00:00 0.0% xntpd/1
   240 root     6860K 4408K sleep   59    0   0:00:00 0.0% nscd/34
   370 messageb 3492K 1940K sleep   59    0   0:00:00 0.0% dbus-daemon/1
   913 root       48M 4812K sleep   59    0   0:00:00 0.0% nmc/1
   251 root     3604K 2620K sleep   59    0   0:00:00 0.0% picld/4
Total: 77 processes, 619 lwps, load averages: 1.21, 0.82, 0.46

And for good measure, the output from a few more dd sessions. 3 WD drives in RAID-0:

code:

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 15.5217 seconds, 34.6 MB/s

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 15.5664 seconds, 34.5 MB/s

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=5M
5242880+0 records in
5242880+0 records out
2684354560 bytes (2.7 GB) copied, 76.7602 seconds, 35.0 MB/s

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=5M
5242880+0 records in
5242880+0 records out
2684354560 bytes (2.7 GB) copied, 76.8237 seconds, 34.9 MB/s

root@nexenta:/export/home/admin# dd if=/dev/zero of=/volumes/tank1/file.out count=5M
5242880+0 records in
5242880+0 records out
2684354560 bytes (2.7 GB) copied, 76.2735 seconds, 35.2 MB/s

If anyone else has ProLiant ZFS benchmarks, it would be great to have some more data points to compare against here.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

Galler posted:

I'll be happy to but I'm really not understanding the dd command/benchmarking process.

If you can give me some basic step by step instructions on doing whatever it is I need to do I will.

e: I go all retarded whenever I get in front of a unix terminal. It's really obnoxious.

All the dd command is telling the machine to do is to write binary zeros (from the /dev/zero "device") to the output file (hence "of") called file.out on my zpool named tank1. The "count" attribute just tells the OS how many blocks worth of zeros to write.

If you're in a place to provide some benchmarks that would be awesome, but the last thing I want to do is encourage you to dive into the command line with a command as powerful as dd if you're not comfortable doing it. If you're not careful, you can nuke things pretty quickly.

Does napp-it provide any sort of benchmarking you could run? If I'm remembering correctly, bonnie++ is part of the tool suite that gets installed with the platform, as an example.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

doctor_god posted:

Don't forget to take the blocksize into account - by default dd transfers in 512 byte increments, and ZFS doesn't really like that, since writes that are less than a full block require the block to be read, modified, and written back to disk. Your filesystem appears to be using the default 128K recordsize, so you should get much better throughput transferring in increments that are a multiple of 128KB:

code:
ein ~ # dd if=/dev/zero of=/pool/test count=1M
1048576+0 records in
1048576+0 records out
536870912 bytes (537 MB) copied, 98.9068 s, 5.4 MB/s

ein ~ # zfs get recordsize pool
NAME  PROPERTY    VALUE    SOURCE
pool  recordsize  128K     default

ein ~ # dd if=/dev/zero of=/pool/test bs=128K count=4K
4096+0 records in
4096+0 records out
536870912 bytes (537 MB) copied, 1.62324 s, 331 MB/s

ein ~ # dd if=/dev/zero of=/pool/test bs=512K count=1K
1024+0 records in
1024+0 records out
536870912 bytes (537 MB) copied, 1.28789 s, 417 MB/s
Edit: This is a 10x2TB RAIDZ-2 pool running on Linux via ZFS-FUSE.

Interesting. Is 128K a reasonable default zpool block size for general use? The end goal here is to present a zvol over iSCSI to my Mac, since Nexenta doesn't come with AFP / netatalk rolled in. As far as I can tell, the default block size on HFS+ is 4K for disks larger than 1GB, and I can't find a straightforward way to change that in Disk Utility. Should I specify a 4K block size for my zpool, then? That seems way small for ZFS stripes, but 128K seems extremely wasteful as an HFS+ block size. Admittedly I'm out of my element when it comes to this sort of tuning, but something about this block size discrepancy doesn't sound quite right to me.

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
Feeling a bit guilty here, feels like I killed this thread with all the talk of benchmarks.

Anyway, I have done still more testing -- installed Solaris and Nexenta Core on different disks, put napp-it on top of each and ran bonnie++ on the system. In both Solaris and Nexenta, bonnie++ said that I did better than 135MB/s sequential read, and 121MB/s sequential write. I'm still seeing way crappier performance when I actually go to *use* this thing, though, instead of just benchmarking it.

I've noticed a particular behavior when doing copy tests that I thought was normal for ZFS, but now I'm beginning to wonder. When I copy a large file over, I'll see lots of progress in the Mac OS X copy window all at once, then it'll suddenly stall. This is exactly concurrent with the disks making write noises. I had just figured that ZFS was caching to RAM and then flushing to disk during these periods. Is this expected behavior? The finder copy progress pretty much halts any time the disks themselves are getting hit, which seems out of sorts to me. :sigh:

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich
Anyone in here running ZFS with iSCSI? A while ago I was playing with different installs on my microserver (Nexenta, NCP+napp-it, Solaris, etc.). I did a lot of testing, got frustrated, and set it down for awhile.

Recently I've had a bit of time to take another crack at it, and I am now realizing that I'm seeing a *huge* discrepancy between transfer speeds under NCP+napp-it, depending on if I'm using iSCSI or AFP to my Mac. I have also had similarly lackluster iSCSI performance using NexentaStor as my platform instead of NCP. I'm hoping to get a couple data points from folks running similar setups, or maybe even some theories as to why iSCSI throughput is sucking.

Basically, over iSCSI I'm hearing bursts of drive activity when things are being written to disk. These are roughly periodic, semi-frequent, and they seem to completely stop up data transfer between the NAS and the computer when they occur. When there isn't drive noise, the transfer is indicated by the OS / application as progressing reasonably, though not as fast as AFP. Read performance seems to similarly suffer from dropouts.

Here's a graph of a read / write test I did to a thin-provisioned iSCSI LUN that illustrates copy speed and the jagged dropouts:

All tests used the free AJA System Test Utility for OS X
2.0GB Disk Read / Write test
Using huge video frames (2048x1556 10-bit RGB)




I also tested iSCSI performance with a 4k block volume LUN:




Now compare those with a couple transfers I did to an AFP share on the same box, with ZFS compression turned on for the share (turning it on or off didn't make much difference). Higher throughput, much more stable writes (where "stable" means not appearing to block I/O), and extremely consistent reads as compared with iSCSI:





I'm truly at a loss here. What could be causing iSCSI to behave so much worse than netatalk's AFP implementation? I'm wide open to any suggestions. I intentionally only have 1GB of RAM in the microserver right now, so as to minimize the amount of data the ZFS would be able to cache in ARC, therefore I do not believe caching to be a substantial factor in these tests.

If anyone else is running iSCSI and has data to share, that would be really great too.

Adbot
ADBOT LOVES YOU

McRib Sandwich
Aug 4, 2006
I am a McRib Sandwich

crm posted:

Can those HP Micro Servers use 3TB drives?

They look... amazing.

Also, does the 4/5 sata spots include the one that comes with it?

There is at least one report in the OP of the ProLiant thread on OCAU that indicates 3TB drive support -- pretty sure the server was running Windows in that instance. Seems to suggest that the integrated hardware / HBA can read the drives, at least.

edit: 4 SATA available in the drive sled configuration, and a fifth port on the motherboard that you can snake up to the 5.25" drive bay. SATA cable not included. Speaking from experience, you'll want a SATA cable with the lowest-profile connector possible. I tried using left- and right-angle cables but there isn't enough space between the motherboard and the drive bay chassis for this to work -- a low-profile straight ahead cable is your best bet, 18-24" should be good for most peripherals.

McRib Sandwich fucked around with this message at 01:25 on Oct 15, 2011

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply