Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
movax
Aug 30, 2008

teamdest posted:

Sorry, I didn't mean to say avoid using ZFS, I simply meant that supposedly it has a Raid-Z2 that is a Raid-6 implementation, and you might consider using that.

I can confirm that RAID-Z2 is sex, and works fine as a storage server for me. (Infrequent writes, very frequent reads, easily limited by GigE).

I believe the ZFS guys recommend no more than 8 (maybe 10) drives to an array, so I get 1397gb*6=8382gb storage, letting up to two disks die.

Adbot
ADBOT LOVES YOU

movax
Aug 30, 2008

Nam Taf posted:

Yeh, it appears there's no capacity to add disks to a pool. That's kinda disappointing but understandable.

Errr, well, you could make a new vdev of another RAID-Z of drives, and add that to your pool.

Combat Pretzel posted:

The maximum stripe size is 128KB. The more disks there are, the thinner it'll be stretched across the disks, possibly ending up in lots of small reads across the array.

This makes much more sense now. Thanks for explaining.

movax
Aug 30, 2008

KennyG posted:

Seagate is going to rue the day they decided to be 'first' in the 1.5TB club. That drive has a potential to do to them what the IBM75GXP (Deathstar) did to IBM, make them sell off the business or rebrand it because their name is gone.

I have one and it's the worst drive I ever bought. I am only using it to archive TV and porn other random stuff so I wouldn't feel that bad if it died, but I will never buy another Seagate product again.

Also, I don't know about the rest of you, this Cyber-Eco-Green bullshit has just got to go. I hate having to pay almost double to get a good drive that can be used in a raid array.

I stuck 8 of those damned Seagates into a RAID-Z2 array; 4 have died so far/been RMA'd. In other news, I'm awesome at handling degraded arrays now :haw:. Everytime I go Seagate I get burned. Coincidentally, any monetary savings achieved going Seagate have gone towards postage for RMAs. :(

The coolest thing by far about using RAID-Z/2 (software RAID in general) is that I can pretty much ignore any drive/controller compatibility issues, or green vs NUCLEAR-PERFORMANCE version issues. (and not shell out for RE2 drives).

movax
Aug 30, 2008

xobofni posted:

Once everything is setup and the server is online, it won't really be reading/writing the CF disk minus logs which you could really just either turn off or mount /var/log on the RAID somewhere.

I think you could setup logrotate.conf a la ipcop/m0n0wall/pF embedded installs to hold logs in RAM and commit them once a week or so. (Or of course, like you said, just mount /var/log to some random drive).

movax
Aug 30, 2008

DLCinferno posted:

Basically, what I'd like to know is what people think about MTBF and its relationship to disk spin up/down. Right now I've got the 4 drives in my array set to spin down after 4 hours of inactivity, which reduces the power consumption from 82 watts to 54 watts. However, I've also been reading that each spinup event is roughly equivalent to 10 hours of activity on the disk. The electricity gains are negligible (about $22 a year if all disks were always spun down), so I guess I'm asking if it is harder on the disk to leave them spinning 24x7 or have them spin up at least once a day for at least 4 hours (I usually turn on music/video when I get home from work).

How do you have your drives set to spin down? I'm running OpenSolaris Nevada (the new/beta branch thing), and I haven't figured that out yet...

movax
Aug 30, 2008

DLCinferno posted:

I haven't been able to get ahold of my friend, but I should have been more clear...I didn't install the patches from the guide, although I did use the newest versions of the packages.

Thanks for the power-man info, I'll have to set that up once I'm back home.

I got some terribly old version of rTorrent running on my Solaris box (Nevada b114 I think) off some packages I found googling rTorrent + Solaris. Manually walked all its dependencies (thanks sunfreeware.org), did some naughty ln'ing of libs, and it runs!

movax
Aug 30, 2008

loosewire posted:

Just a quick question hope someone can help as i have had no luck with searching - When copying files to a CIFS share running on Opensolaris how do you set the default unix file creation permissions for those files in the unix side file system?

Ahahahaha. Hahahahahah.

ACLs. Specifically, NFSv4 ACLs. I got my rear end kicked by them this winter break, but I think I understand now, though I'm certain my configuration would make a real Solaris admin cringe. Without ACLs, the group will keep getting set ephemeral IDs and its terrible. Living outside of a domain is oh-so-tough.

If you sudo to root, you will be using the ls located in /bin/ls. This is important for one very important reason: -V. This shows you the ACL permissions as well. Likewise, the chmod located in /bin gains some extra abilities. Abilities like so:
code:
movax@megatron:/tank/video/tv# ls -Vd .
drwxr-xr-x+ 37 movax    staff         40 Jan  5 01:41 .
                 owner@:rwxpdDaARWcCos:-d-----:allow
                 owner@:rw-pdDaARWcCos:f------:allow
                 group@:r-x---a-R-c--s:-d-----:allow
                 group@:r-----a-R-c--s:f------:allow
              everyone@:r-x---a-R-c--s:-d-----:allow
              everyone@:r-----a-R-c--s:f------:allow
Here is a basic overview: http://breden.org.uk/2009/05/10/home-fileserver-zfs-file-systems I printed out the Sun ZFS reference as well and studied it.

This is what I did:
code:
chmod A0=owner@:rwxpdDaARWcCos:d:allow .
chmod A1=owner@:rwpdDaARWcCos:f:allow .
chmod A2=group@:rxaRcs:d:allow .
chmod A3=group@:raRcs:f:allow .
chmod A4=everyone@:rxaRcs:d:allow .
chmod A5=everyone@:raRcs:f:allow .
(applied recursively, et. al. as needed)

Flaw: Executable permissions can get weird...
A0 sets the permissions to basically allow the owner to do whatever the gently caress he wants. That ': d :' makes it apply to directories only, and all the subsequent directories created. If you look at A1, it lacks the 'x', the executable property, because A1 only applies owner properties to files (: f :). 'x' is needed for you to list directories.
Respectively, A2/A3 apply to your user group, A4/A5 apply to everyone. What I have done is allow my user movax to do whatever the gently caress I want to my files because hey, it's my poo poo. I also made a generic user, 'media', with no shell for CIFS purposes. He falls under 'everyone' and can read and list dirs to his content, perfect for sharing files. Using the magic of ACLs,
code:
chmod A4=everyone@:rwxpaRcs:d:allow .
chmod A5=everyone@:rwapRcs:f:allow .
I set that in a public folder, allowing anyone to upload poo poo to my machine. Slapped a 500G ZFS quota on it so no one can fill me with more horse porn that I would like. You can use ACLs to set arbitrary permissions on whatever. Want user joebob to have write/append capability to a certain file? Just chmod A+user:joebob:<perms>::allow <file>!

Be warned, Windows applies all the DENY rules first, followed by ALLOWs.

If there's interest I can post more details on my setup.

:siren: AMD Family 15 and below CANNOT FREQUENCY SCALE AS OF 1/1/2010. Get a newer AMD CPU if you don't want it running full-tilt all the time. It will go between C0 and C1, but no intermediate P-states are supported. :siren:

:siren: ZFS de-duplication is cool, but owns the poo poo out of your CPU. I saw load averages of 14 with a 4450e. Admittedly, it was underclocked/undervolted, but if you want dedup, get a beefy CPU backbone going :siren:

:siren: LSI 1068E running in non-RAID mode is the poo poo. Supermicro AOC-USAS-L8i, I got two of 'em running in an Asus M4A78-E. mpt driver works perfectly with it :siren:

I have enough confidence in ZFS + RAID-Z2 that I pulled out 3 drives at random from a 16-drive array that was 14TB full at a convention panel to prove my point. RAID-Z only rebuilds data that was present. Additionally, RAID-Z3 is available now for the insanely paranoid.

movax fucked around with this message at 09:18 on Jan 6, 2010

movax
Aug 30, 2008

FISHMANPET, they come stock in "IT" mode, no RAID firmware present. I got mine from eBay.

phorge/FISHMANPET, I have 'em hanging free for right now, but a buddy of mine just used nylon spacers between the board and bracket, and used the next slot over to secure them. There are a bunch of creative ways to mount it.

Pics (my flickr):





Speed:


Machine has since gained another L8i and 4x2GB sticks of RAM.

movax
Aug 30, 2008

FISHMANPET posted:

For anybody interested in this card, particulary movex, who already has one, I found a bracket online that will work for it. SuperMicro was stupid enough to use standard hole spacing, so all you need is a standard PCI bracket with tabs. Keystone Electronics makes one. You can get it from digikey for about 5 dollars shipped:
http://search.digikey.com/scripts/DkSearch/dksus.dll?site=us&lang=en&mpart=9203

There are other places that have it cheaper per unit, but there are order minimums and they ship via UPS, so it costs way more for 1 of them. Digikey can ship via the postal service. But if you're doing something with a bunch of these, you can search all of Keystone's vendors here:
http://www.keyelco.com/order.asp
The part number to search for is 9203. If you dig around the Keystone site you can find mechanical drawings of the bracket. I compaired their measurements to my card and it looks like it should work. I'll post a trip report when I get it.

That bracket is sweet, I may jump on those just to make my server innards a little bit neater.

As for fans and the Norco 4020/4220, I replaced all the 80mms with Yate Loon 80mms, taped off any other holes in the fan bracket (wind tunnel please), and my drives do pretty good, 40C at load, and its pretty quiet to boot. Only 8 drives though...

movax
Aug 30, 2008

FISHMANPET posted:

I've never had a Samsung drive fail, and I've been using 8 of them for nearly 2 days :smug:
On the other hand I've various Seagates of various sizes for 4.5 years, and I've had one entire drive fail!

Seagates are poo poo, Samsung superiority! :smug:
:spergin:

I've never had WD drives fail, clearly WD is the superior species of drive :smugbert:

Seriously though, I need to get 8 more 1.5TB drives, and am debating getting 7200rpm drives to match the current Seagates, or dropping to 5400rpm drives to save power. Limiting factor is GigE I/O.

movax
Aug 30, 2008

Farmer Crack-rear end posted:

I've heard the drive sleds can be flimsy, but honestly for that price that's not a bad tradeoff in the slightest, considering the next higher level of quality will jump the price by several hundred dollars.

Yeah, the drive sleds for my -4020 bend pretty easily, but seriously, $250.00 for a 20-bay 4U case w/ backplane and pretty decent(tm) build quality, no real major complaints about it. You get what you pay for.

movax
Aug 30, 2008

Methylethylaldehyde posted:

I have my current box set up as a media server, 8x 1.5 TB Western Digital Green drives in a RAIDZ2 (raid6) array. The filebench media benchmark showed ~220MB/sec reads and ~180MB/sec writes. You could get better performance by making two 4 disk RAID 5s and striping them together, but I went with increased reliability over increased speed.

Hmm, my writes with 8 1.5TB Seagates over CIFS only seem to peak around 70MB/s or so, and slow down from then...my CPU is also a pretty "weak" undervolted Athlon though, and the load average spikes, so maybe upgrading that could help.

quote:

disk took a poo poo, so I reinstalled solaris, ran the command, and all my data was back in about 3 seconds.

Same here, but all the customization/software I had installed/random .conf fixes that took hours to find all died with the original drive. =[ No backups either, naturally...

(Boot pool is now a 30GB Vertex SSD!)

movax
Aug 30, 2008

Farmer Crack-rear end posted:

Have you turned on jumbo frames on your network cards? (And/or, does your network switch support them?)

Hmm, they may not be on at the moment...I recently switched switches to a PowerConnect 5324, which I know does jumbo frames. My set-top boxes have issues with them though...will setting jumbo frames to on in the e1000 driver on Solaris screw with them? (Popcorn Hour & clone).

movax
Aug 30, 2008

So I've pretty much filled my existing RAID-Z2 array (8x1.5TB)...I want to add another 8 drives to the zpool as another vdev (pretty sure you can do that), am I going to kill performance by using 8x2TB as opposed to matching the existing drives (7200rpm Seagate 1.5s)?

movax
Aug 30, 2008

FISHMANPET posted:

Nope. It'll right most of the new data to the second vdev anway, by virtue of the first one being full.

Gotcha. Now to wait for bank account stabilization so I can buy 8 2TB drives.

movax
Aug 30, 2008

TDD_Shizzy posted:

Edit: I more or less figured out the answer, and I think I have a decent solution until I could afford to get 4-6 2TB drives. Setup my RaidZ with all my existing drives, 3x1TB and 3x1.5TB. When I pick up a new 2TB, or can buy them, swap them with the smallest drive and let it repair?

Yep - I believe that process is called resilvering. Only thing that scares me about that is the harddrive thrashing that comes along with it.

movax
Aug 30, 2008

lilbean posted:

We've been migrating off of it this year to Linux, and only have a few machines left. ZFS was fantastic, but it's not worth everything else that comes with the OS. And gently caress SPARCs for a web shop. No idea who thought that was a good idea.

I need to find something to hold 12TB of data while I try to find a OpenSolaris replacement :ohdear:

Either that, or wait for *BSD ZFS to catch up to my pool version, and then import the sucker. God, I hate dealing with Solaris, but at least BSD is less idiosyncratic about stuff.

movax
Aug 30, 2008

So my current array (ZFS RAID-Z2) under some build of OpenSolaris (I think I installed 2009.06, switched to dev repo, and hit 'update' five months ago) is beginning to fill up. It is 8x1.5TB at the moment.

So, I'm thinking of the following:

1) Buy 8x2TB drives, make 'em a RAID-Z2 vdev, add to my existing pool, probably the easiest option
2) Buy 8x2TB drives, use them to juggle data whilst I try to find a new NAS solution. I think I don't need to do this because if/when ZFS matures on other OSes/distros, I can simply export/import a pool.

I shouldn't run into any performance issues combining a 8x1.5TB and 8x2TB RAID-Z2 vdev, right? 1TB drives don't give me the density I want, and it seems silly to buy 1.5TB drives with 2TB prices crashing. I failed at power-consumption long-ago, current drives are 7200RPM Seagates, so I'll just look for non-Green 2TB drives from Hitachi or something, since I heard you can't kill off/on TLER on WD drives anymore.

movax
Aug 30, 2008

FISHMANPET posted:

E:
^^^^^^^^^^
There shouldn't be any problems having vdevs of different drive sizes. What kind of enclosure are you running where throwing in 8 more drives is no big deal?

Norco RPC-4020. I do need to pick up another Supermicro USAS-L8i though and cables, heh, I almost forgot about that part.

e: I realized heat generation is going to go way the gently caress up, gah. Might have to bring fans back up to full speed.

movax fucked around with this message at 04:50 on Oct 1, 2010

movax
Aug 30, 2008

FISHMANPET posted:

How many drives? I've got personal experience on OpenSolaris with 2 port and 8 port SATA cards, and can interpolate a recomendation for a 4 port card. Later I'll post my OpenSolaris build.

You'll also want to buy an Intel NIC.

Yep, I've been running OpenSolaris for about 1.5yrs or so, currently 8x1.5TB RAID-Z2 (1.93 gb free :ohdear:) in a Norco RPC-4020. Been pretty smooth, other than having to learn Solaris idiosyncrasies. My main HBA is the USAS-L8i from Supermicro, just bought another one to power another 8 drives.

To hijack a bit...right now I have 8 7200rpm Seagate 1.5s. Looking at 2TB drives for the next 8...what brand(s)/model should I get? Limiting I/O is a single-port Intel NIC at the moment, so I guess my preference is currently leaning towards 5400rpm drives for less heat. I've heard good things about the Hitachi 2TB.

Also...I hear you get an insane boost in (write?) speed by tossing in a SSD and turning it into a L2ARC device? If that SSD doesn't need to be very big, I can just get some 30GB drive with a decent controller and toss it in.

movax
Aug 30, 2008

I think my previous post got lost in the thread somewhere, so I'll ask again here real quick:

Current: 8x1.5TB 7200rpm Seagate in RAID-Z2
Wish: Purchase another 8x2TB drives, make RAID-Z2 vdev, add to existing pool.
Interface (aka bottleneck): Single Intel GigE NIC

1: What drives should I be looking at? Green or non-Green? 5400 or 7200? Don't want to have to deal with TLER enabling/disabling fuckery.

2: I have 8GB of RAM, so my first priority would rather be a dedicated ZIL device to speed up writes rather than a large SSD for L2ARC. Confirm/deny?

movax
Aug 30, 2008

Methylethylaldehyde posted:

Avoid green/4k drives like the plague. I had a ton of issues with the WD Advanced Format drives because ZFS is way more clever than the sector emulation stuff it uses.

Grab a 60-80 Gig Intel SSD, partition it 2gb ZIL, 58gb cache, works great because you don't often have writes that need to be both low latency and high throughput. I had an OCZ Solid II in my box and once you cached folder/file metadata to it, it made browsing SMB shares way more responsive.

You can, if you have a decent switch, trunk two or more GigE connections together to get some additional bandwidth.


Never ever shut your machine down. The cache currently is non-persistent, and it can take you upwards of a week to fully populate the cache on a decent sized SSD.

Avoid green drives, got it. Intel drive...SLC or MLC? Judging from your suggested capacity, MLC? I thought I read somewhere where they murdered a MLC OCZ drive as ZIL in only about a month of service. I guess I should just wait for some neat 7200rpm drives to come on sale then (my current drives are 7200rpm; will dropping in 5400rpm drives absolutely slaughter my performance?)

@Norco questions: I replaced all the 80mm in my 4020 w/ Yate Loons. I know you can replace the middle fans with 3 x 120mm fans if you wish too as well, there was a thread on AVSForums about that.

movax
Aug 30, 2008

Methylethylaldehyde posted:

It all depends on the configuration of your vdevs. If you have say 8 drives, and make two 4 drive raidZ sets, you get twice the IOPS of a single 8 drive raidZ2. Both have the same useful capacity, while one has a better chance of not dying during a rebuild. The difference between an 8 drive vdev made of 5400 rpm drives and a 2x4 disk vdev set made from 7200 rpm drives is maybe 5% total throughput for media streaming applications, but the faster drives will have about 2.2x the total IOPS. This is all bullshit if you have a warm SSD cache device though, because it'll serve 90% of your frequently accessed poo poo lightening fast, and stream TV shows through fine.

I would just get any of the current generation SSDs and call it a day. They have enough internal cache to batch up writes from the host to the ZIL files, and they're gonna saturate the SATA bus while still giving you 10k+ IOPS. Some of the new sandforce based OCZ drives are really awesome.

I suppose if you had a large DB being serviced by a 2nd gen OCZ SSD, you could burn them out that fast. If the wear leveling algorithm isn't robust, it can and sometimes will write itself into a corner and hammer at certain memory cells. If you de-rate the flash memory to realistic instead of 'round numbers', and you have a small cache with a crappy write ballancer, you could kill an SSD within that month timeframe.

Hm, so doing 2 4x2TB RAID-Zs would give me better IOPS and a better chance of surviving a rebuild, I like it. I might just go for 5400rpm drives then (they still make 5400rpm drives without head-parking/green/etc bullshit right?) and the SSD as a cache.

Pick up a 60-80GB-sized Sandforce drive and partition as recommended (2GB ZIL/58GB L2ARC? Bigger ZIL won't help?). I don't have a DB or anything crazy, just files, so I guess I don't have to worry that much. Partition using gparted then assign the resultant devices using zpool?

And of course, I guess once I make the new vdevs...how can I "shift" my data from the old drives to the new drives so I can destroy the old vdev and remake it into 2 4x1.5TB vdevs (or replace 'em w/ 2TB drives)?

movax
Aug 30, 2008

FISHMANPET posted:

I'm not even sure what 8 port SATA card would work with OpenSolaris. I went straight to SAS for my 8port needs, never looked back.

Supermicro USAS-L8i, LSI 1068E-based. Not technically 8-SATA ports, but you can buy the appropriate breakout to go from 1xSFF-blah to 4xSATA.

movax
Aug 30, 2008

Ok, so ordered a Vertex 2 60GB SSD for ZIL/L2ARC; how should I partition this? 10GB ZIL/50GB L2ARC?

Also, shopping around for 2TB drives...might wait for a sale on certain models, but I quickly perused newegg's selection. Ignoring Caviar Green, Barracuda LP and any other green stuff (is WD's Green "AV" drive not suggested?), I pretty much found:

Hitachi DeskStar 2TB 7200rpm
Samsung SpinPoint F4 2TB 5400rpm(sold out :smith:)
Seagate Barracude 2TB 5900rpm

I won't be ordering from newegg probably, because well, gently caress their HDD packing, but if that SSD functions as expected ZIL/L2ARC wise (based on what methylethylaldehyde has suggested), then the Samsung drives look good. Might get a full 16 of them and sell my old disks depending on answer to question in previous post (shifting around data in zpool).

e: I guess the F4 is out because ZFS will not function properly with 4K sector drives? Or is just the WD drives because they emulate 512b sectors, and letting the F4 appear as a 4K drive, ZFS won't care? Or should I just buy 7K2000s and be done with it?

movax fucked around with this message at 21:21 on Nov 2, 2010

movax
Aug 30, 2008

Methylethylaldehyde posted:

The ZIL only needs to hold like 5 seconds worth of writes before they're purged to spinning disks. 5 seconds of ~240MB/sec is ~2GB. The rest you can use as regular cache, which is awesome.

And yeah, I had the intel rebrand of the LSI-1068E card, flashed them with the IT firmware and they work great.


Easiest way I found to shift the data is to go to best buy, buy two or three 2TB drives, move your data to them, break the vdev, remake it, copy the data back, and zero the drive+return them.

Hmm, I'll need to buy like 5 2TB drives to serve as a temporary scratchpad, heh. What if I made an entirely new zpool (off-topic: can you rename zpools?), added my new vdevs to that (initially 2 4x2TB RAID-Zs, maybe 4 4x2TB), copied data from old zpool to new zpool, then destroy old zpool? I'm thinking of just replacing all disks with 2TB models and selling/getting rid of the 1.5TB drives.

I assume that it would be sane to stripe all those vdevs?

movax
Aug 30, 2008

necrobobsledder posted:

Isn't NexentaStor "free" only up to about 8TB of used space though? It's not a good long term solution if you're living up to the thread title if you ask me.

Yep, after that it costs $$$. No good for me, I'm already over 8TB. I will probably move to OpenIndiana soon.

movax
Aug 30, 2008

adorai posted:

12tb USED. I believe they intend to up this over time, as drives grow.

What happens when you hit that? Or do they just limit pool creation size to a max usable capacity of 12.000000TB?

movax
Aug 30, 2008

Hm, thinking about it, wouldn't a 8 drive RAID-Z2 be "safer" then 2 4 drive RAID-Zs? If two drives die in that RAID-Z, the whole pool is hosed since that vdev is hosed.

So trading off IOPS for safety?

e: Hacked up a spreadsheet to figure some poo poo out. I think since my priority is data safety, I'm going to be willing to sacrifice some IOPS (that hopefully get made up by SSD). And I think I'd like to have hot-spares, which will avoid rebuilds I think?

Basic things noticed so far, as # of vdevs goes up, RAID-Z3 begins to be a poor option obviously, as it approaches the same capacity you'd get from a straight mirror, but with poo poo IOPS performance. For 20 2TB drives, doing 4 RAID-Zx devices, mirroring gets me 20TB, -Z3 gets me 16TB (dumb), -Z2 gets me 24TB...at 1/3 the IOPS of a mirror, -Z gets me 32TB, but I don't want to do Z (maybe with a hotspare, but -Z2 just seems smart).

e2: where N is number of drives and M is number of vdevs, with -Z2, when N/M = 4, mirror and -Z2 capacity are identical (logically)...hooray storage solution finding

movax fucked around with this message at 22:17 on Nov 3, 2010

movax
Aug 30, 2008

adorai posted:

Do you have to have 4 vdevs? If it were me, I would run 2 9 disk radz2 vdevs with 2 hot spares. 14 disks worth of usable space, immediate rebuild with hot spares. Do you have exactly 20 disks worth of controller capacity? If so, you'll need to drop 1 spare for your SSD.

I'm not sure yet, still shifting capacities around in my head. I have a 20-bay chassis, so that's the upper-limit of drives. 16 can run on HBAs, the rest will be off the mobo. I *think* I want hot-spares, because if I understand how they function properly (which I probably don't), that reduces my risk of data loss even more.

So I could do for instance:
18 drives in 2x9 -Z2 vdevs + 2 hot spares for total capacity of 28TB (~1200 IOPS)
18 drives in 3x6 -Z2 vdevs + 3 hot spares (cram the odd disk into the Norco's odd-bay-out maybe) for 24TB (~2250 IOPS)
20 drives in 4x5 -Z2 vdevs w/ no hot spares for 24TB (~3300 IOPS)
20 drives mirrored, 20TB usable, 10k IOPS

IOPS I just assigned an arbitrary guess of 500/device, so I could compare IO performance between configurations. Need to figure out what exactly I am looking for, I suppose.

movax
Aug 30, 2008

adorai posted:

For a 7+2 raidz2 vdev running 7200rpm disks, you are probably right on on your 600 per vdev estimate. But you aren't factoring in the cache. Are you building a high volume VM environment or something? Why are both iops and capacity so important to you?

Nope, not at all. I would say that this machine will spend 75% of its day idle, doing absolutely nothing. Just storing files that I'll be accessing when I'm home from work.

Right now, I have just 8 1.5TB 7200rpm disks in RAID-Z2, over GigE. I am somewhat disappointed with write performance (~20MB/s) but satisfied with reads, though I think they could be better (~80MB/s). I guess I want to strike a compromise between capacity & IOPS, but if I had to choose, it would be capacity, as 8GB RAM + 2GB ZIL/58G L2ARC can make up for some "lost" IOPS I think.

How exactly do hot-spares function in a -Z/2/3 environment? As soon as a disk is degraded, it begins rebuilding onto the hot-spare disk? Or does the hot-spare behave like it would in a mirror, and seamlessly failover, leaving you with 2-disk redundancy?

movax
Aug 30, 2008

So doing a bit more reading, I guess you can share hot-spares between vdevs, which is cool (so if I have 3 vdevs, I can share 2 hot-spares between them...I'd have to have pretty poo poo luck for 3 hot-spares to be needed in a given timeframe).

What I've kinda narrowed it down too:
3x6 -Z2s, 24TB Usable (2 hotspares)
4x5 -Z2s, 24TB Usable (no hotspares)

Pretty sure I want the hot-spares and a neat fit into 20-bays, so 3x6 looks tempting. Read somewhere about not using an even # of disks in a vdev though...?


Also, if anyone wants to see the spreadsheet I've been using, here: http://dropbox.movax.org/ZFS.xlsx

e: ^^^ Googling around, WD Greens in particular give users a hard time, and the sector emulation crap on any 4k drive apparently pisses ZFS off. There's a workaround, but I'm not personally willing to risk my data to a "workaround". I'd try to track down non-Green 5400rpm drives. Pretty sure green drives that aren't from WD and are <2TB/don't have 4k sectors are OK though

movax fucked around with this message at 16:09 on Nov 4, 2010

movax
Aug 30, 2008

adorai posted:

There is something wrong there, I get ~25MBps on a 3+1 raidz1 of green drives. Hot spares sit idle, and begin a rebuild as soon as a disk fails.

Yeah, I don't know what the gently caress, they are the first-gen Seagate 1.5TB drives w/ firmware patch. I did upgrade/stop under-volting the CPU, which helped boost write performance. Intel NIC + a PowerConnect I figure is halfway decent network infrastructure, so... :iiam:

So, if I'm settled on 3x6 -Z2 + 2 hotspares...and want to shift my data over, what can I do? Is it possible to zpool export my current pool, and cobble together a machine with 8 SATA ports across mobo + SATA HBA and successfully zpool import the pool and read all the data off it?

movax
Aug 30, 2008

adorai posted:

You can add vdevs on the fly, so you can just put your 8 disk pool in with the first two 6 disk vdevs, copy the data over, remove your 8 disks and put in the 8 additional 2tb drives.

Err, can you copy data from vdev to vdev? I thought you'd have to do pool to pool (so add the first 2 6 disk vdevs to the machine in a different pool, copy poo poo over, kill off old 8x1.5 vdev/pool, add another 6 disk vdev at some point in the future)?

I figure it'll be cheaper for me to just add 2x6 for now, and then 1x6 at the beginning of next year or something.

movax
Aug 30, 2008

necrobobsledder posted:

Umm, how many platters per disk? I've got 4x2TB 4k WDs as a (I'm feeling lucky) 5x1TB array with mixed platter configs (this one all on 512b sectors) and I don't get performance anywhere near that poor in either - they're within about 20% of each other and writes easily saturate a gigabit ethernet connection at 40MBps, in fact. I'm not even using an Intel NIC either - some crappy Realtek onboard NIC with similarly crappy NICs from the clients. I was about to order an Intel NIC but wanted to give the guy a chance. I'm kind of amazed I'm not experiencing any problems functional or performance-related.

Hm, not sure of the exact model # at the moment (server is still powered off at home, on the road atm), but I think they are the ST31500341AS (7200.11), don't see # of spindles listed in the datasheet from Seagate's website.

How are those 4k-sector drives (with emulation on?) treating you? Do you have that gnop or whatnot workaround active?

movax
Aug 30, 2008

Moey posted:

Yea that's what was lingering in the back of my mind...I just couldn't bring myself to say it.

150GB[, not TB? Depending on your budget, you could SSD that poo poo. If not, it would be criminal to do anything less than RAID10 with that small of a dataset (or hell just do like a 3-way mirror using ZFS or something if you don't want to buy hardware. You can even use those super-speedy 1TB disks for room to grow)!

movax
Aug 30, 2008

Methylethylaldehyde posted:

Run iostat -xen 5 and see what your drives are doing as you pull stuff over CIFS/NFS and to dev/null locally. Each disk should be able to do about 35-50MB/sec, and over the network, you should be able to get ~80-100MB/sec. I just checked mine and it'll do ~85ish over the network using Windows 7 CIFS and a 3com Managed Gigabit switch. No jumbo packets yet.

Roger, will do when I get back home. I almost forgot to ask, what kind of performance penalty am I looking at for running 16 drives off 2 1068Es, and the last 4 off the mobo SATA controller (though 2 of those 4 will be hotspares).

And I should have 0 penalties for creating a pool w/ 2 vdevs to start and then adding a 3rd identical vdev in a few months, correct?

movax
Aug 30, 2008

wang souffle posted:

Any opinions on the Seagate Barracuda LP drives for low power when working with ZFS and primarily large files? They known to cause problems like the WD Green drives?

http://www.newegg.com/Product/Product.aspx?Item=N82E16822148413

I think if it isn't 4k sector drive, and it's at a fixed 5900rpm without any head-parking/other green crap, it's probably suitable for ZFS use.

e: DS reports 512b sectors, so I think you may be good to go...
e2: stop buying Hitachis you assholes, they keep going out of stock! :argh:

movax fucked around with this message at 17:07 on Nov 5, 2010

movax
Aug 30, 2008

wang souffle posted:

That's the thing. I've been researching these drives for a couple days and can't find definitive word if they're 4k liars or not. And no idea how to find out about the head parking. Specs on the websites are very sparse for each manufacturer.

Edit: With all major drive makers moving to 4k sectors, you'd figure OpenIndiana would handle this smoothly by now. Or do they, and the misreporting is causing all the issues?

I looked at the datasheet for your Barracuda LP drives; they are 512-byte sector drives.

Adbot
ADBOT LOVES YOU

movax
Aug 30, 2008

wang souffle posted:

Strange, this link has a mention of "advanced format" in the bottom right. Way to make it confusing, Samsung.

Ah, I looked at this: http://www.seagate.com/docs/pdf/datasheet/disc/ds_barracuda_lp.pdf

But it's possible that the 512 listed there is after emulation...probably the only way to be sure is to e-mail Seagate and ask them. Then post the answer here so that we may all know!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply