|
Storage arrays singing actually sound more like this
|
# ? Mar 5, 2012 08:24 |
|
|
# ? May 21, 2024 15:00 |
|
evil_bunnY posted:Cause of death: random seek to the back of the head. Now that's a head crash! *crickets* error1 posted:Storage arrays singing actually sound more like this I got to hear that sound last week for similar reasons (replacing a UPS), though not quite on that scale, since my EVA is only 1 rack. Intraveinous fucked around with this message at 21:34 on Mar 5, 2012 |
# ? Mar 5, 2012 21:27 |
|
I don't know if this is enterprise enough a question, but this seems like the crowd that would be most likely to know the answer: In a Linux Software-RAID RAID10 array, is there any way to tell which two drives are part of one striped set and which two drives are part of the second striped set that is mirroring the first? I understand that's the philosophy behind RAID10, two striped sets that mirror each other. Does this philosophy not apply to Linux Software-RAID? I only ask because I had one disk fail in a RAID10 setup and I'd like to know what's happening behind the scenes. mdadm --detail /dev/md0 pre:/dev/md0: Version : 1.2 Creation Time : Mon Mar 5 13:44:06 2012 Raid Level : raid10 Array Size : 3907021632 (3726.03 GiB 4000.79 GB) Used Dev Size : 1953510816 (1863.01 GiB 2000.40 GB) Raid Devices : 4 Total Devices : 3 Persistence : Superblock is persistent Update Time : Mon Mar 5 23:43:22 2012 State : clean, degraded Active Devices : 3 Working Devices : 3 Failed Devices : 0 Spare Devices : 0 Layout : near=2 Chunk Size : 32K Name : [myhost] UUID : fb317521:9023d975:d4559f15:6ac2341a Events : 47 Number Major Minor RaidDevice State 0 8 17 0 active sync /dev/sdb1 1 8 33 1 active sync /dev/sdc1 2 0 0 2 removed 3 8 65 3 active sync /dev/sde1 pre:Personalities : [raid10] md0 : active raid10 sde1[3] sdc1[1] sdb1[0] 3907021632 blocks super 1.2 32K chunks 2 near-copies [4/3] [UU_U] unused devices: <none> I'm willing to concede that I understand very little about Linux Software-RAID, and I was hoping to learn a little more on the fly, but my disk basically failed a few hours after I took it out of the wrapping and popped it into a tray. And I guess my second question is what sort of penalty I'm paying here while the system is degraded? I don't see a significant slowdown but there's nothing that's heavily IO-bound running on this machine. I'd love to know what the system is doing while it's in degraded mode. Is it just writing to one striped set and the second set is going un-mirrored? Hopefully that doesn't sound TOO ridiculous a question. Anything you guys can shed light on would be most appreciated.
|
# ? Mar 6, 2012 05:50 |
|
I'm not too familiar with software RAID on Linux either however I've just had a look at the man page for mdadm and you can use the --examine switch to get details on components in the array so what output do you get when you run the following:pre:mdadm --examine /dev/sdb1 mdadm --examine /dev/sdc1 msadm --examine /dev/sde1
|
# ? Mar 6, 2012 10:29 |
|
Martytoof posted:I don't know if this is enterprise enough a question, but this seems like the crowd that would be most likely to know the answer: You have it backwards. RAID 10 is x number of mirror sets striped across, not x number of stripe sets mirrored. This is a significant difference, because it means that (for a 4 disk array) there are 4 combinations of two drives failing without loss of the array, rather than just 2. I'm not familiar with linux software RAID, so I couldn't tell you if there's a software way to tell which drive belongs to which mirror set, but there's always the empirical method. Set up a test array and try pulling out the different possible combinations of two drives. If the array fails, then both drives belonged to the same mirror set. GMontag fucked around with this message at 12:14 on Mar 6, 2012 |
# ? Mar 6, 2012 12:04 |
|
It may have been talked about in the many pages in this thread that I haven't witnessed, but anyone got any quick advice: A machine with two major datasets (used purely for storage): Now, would it be faster to have two 4 drive RAIDZ(5) arrays, or just put all the drives (8) in RAIDZ2(6). Obviously RAIDZ2(6) would give better fault tolerance, but it's all going to be backed up so that's not super crucial. Just in case anyone has had the same dilemma, or maybe this is indeed a stupid question with an obvious answer, which is fine too. HalloKitty fucked around with this message at 13:10 on Mar 6, 2012 |
# ? Mar 6, 2012 13:06 |
|
cheese-cube posted:I'm not too familiar with software RAID on Linux either however I've just had a look at the man page for mdadm and you can use the --examine switch to get details on components in the array so what output do you get when you run the following: Good call, I've --examine'd the array container itself but I didn't think to run it against individual members: pre:/dev/sdb1: Magic : a92b4efc Version : 1.2 Feature Map : 0x0 Array UUID : fb317521:9023d975:d4559f15:6ac2341a Name : [myhost]:0 (local to host [myhost]) Creation Time : Mon Mar 5 13:44:06 2012 Raid Level : raid10 Raid Devices : 4 Avail Dev Size : 3907021954 (1863.01 GiB 2000.40 GB) Array Size : 7814043264 (3726.03 GiB 4000.79 GB) Used Dev Size : 3907021632 (1863.01 GiB 2000.40 GB) Data Offset : 2048 sectors Super Offset : 8 sectors State : clean Device UUID : 2e178adc:063fa149:18eb94a5:b65f3740 Update Time : Tue Mar 6 08:00:47 2012 Checksum : 723e41a6 - correct Events : 63 Layout : near=2 Chunk Size : 32K Device Role : Active device 0 Array State : AA.A ('A' == active, '.' == missing) GMontag posted:You have it backwards. RAID 10 is x number of mirror sets striped across, not x number of stripe sets mirrored. This is a significant difference, because it means that (for a 4 disk array) there are 4 combinations of two drives failing without loss of the array, rather than just 2. Oh wow, I completely misread the fundamentals of RAID10 -- it makes much more sense this way, and it's much more robust than I thought. So in my case (stripe is intact, but mirror x is missing one member) am I reading it right to say that there really ought to not be any visible performance penalty to the system? Now please don't fail, x2, until I can get your replacement in . I put the other disks through their paces with the RAID resync last night, but since all the disks are from the same batch I'm thinking about going out to buy a separate 2TB disk in addition to replacing this one just to act as a hotspare. e: As far as which disk does what in the array, I guess I'm not terribly concerned about that, as long as the system knows what it's doing. It would be NICE to know, and logically I suppose I could guess that the first two disks are the first mirror, the second two are the second mirror and I'm striping across {sdb,sdc}-{sdd/sde}, but I guess I'll just do some more googling. Thanks for the advice everyone, and thanks for setting me straight on RAID10. Words can't express how dumb I feel that I missed that small detail. edit: http://en.wikipedia.org/wiki/Non-standard_RAID_levels has some interesting details on the layout method of how Linux runs RAID10 which (in my mind) explains why there is no set list of exactly what is mirroring what and striping across what. some kinda jackal fucked around with this message at 14:30 on Mar 6, 2012 |
# ? Mar 6, 2012 14:09 |
|
Martytoof posted:So in my case (stripe is intact, but mirror x is missing one member) am I reading it right to say that there really ought to not be any visible performance penalty to the system? It depends on how good linux software raid is at load balancing across mirrors. Raid-1 is usually very fast for reads since you double the amount of spindles and read heads, meaning that if you're reading with two threads or more they can use one disk each and send every other read request to different mirrors. The downside is that writes are slower since you have to update two copies of the data, and the drives usually spin independently so there is additional write latency as you wait for two spindles to rotate around to the right spot. I suspect that degraded raid10 arrays have quite worse read performance but actually slightly better write performance as a result.
|
# ? Mar 6, 2012 15:00 |
|
I hadn't considered multithreaded reading. There would definitely be a penalty if that's the case, but I can't empirically say whether or not that's what happens. Again, thanks for all the advice everyone. At least now that this thing is done with I get to go play with my ZFS iSCSI whitebox again
|
# ? Mar 6, 2012 16:22 |
|
HalloKitty posted:It may have been talked about in the many pages in this thread that I haven't witnessed, but anyone got any quick advice: Two vdevs (each RAIDZ set is a vdev) are faster than one vdev, even when it's the same pool. The way RAIDZ works, is that each vdev is as fast as its slowest member, because for every read it has to read off of every drive in the vdev. When you add two vdevs the data is striped across both (essential a RAID 0) so it can read twice as fast. The only thing you lose is data integrity. With one RAIDZ2 you can lose any two disks and still be fine, with two RAIDZs you can lose two disks, but if they're from the same vdev you lose everything (remember, it's a stripe). So, depending on how good your backups are and how bad downtime would be, I would say make on pool with two vdevs.
|
# ? Mar 6, 2012 17:13 |
|
This might be more of a networking question... I'm seeing a huge amount of dropped packets on the switch access ports my P4000's are connected to. The autoneg and speed are set correctly from what I can see. Flow control is enabled on those ports, but not on the trunk link between the two switches they're connected to. Will bad things happen if I turn flow-control on the trunk? Will this alleviate my dropped packets from the SAN?
|
# ? Mar 7, 2012 13:33 |
|
I'm a terrible networking guy, but flowcontrol shouldn't be causing that. Turning it off shouldn't cause any problems as far as I know. Have you tried forcing the ports to gigabit, one at a time so you don't interrupt service? Or replacing the cat6 cables? Do you see any errors on those ports on the switch? Or maybe offline one port, turn it back on, rinse and repeat?
|
# ? Mar 7, 2012 14:05 |
|
I'm seeing errors on the switch, they're showing up in the Tx Err column. Not sure if I can see errors on the P4000 easily (unless it's well hidden in CMC). Setting flow-control on the switch trunk didn't help. The ports are configured Auto on the switch and 1000/Auto on the SAN. Might try forcing them to fixed 1000 to see what happens.
|
# ? Mar 7, 2012 14:27 |
|
luminalflux posted:This might be more of a networking question... I'm seeing a huge amount of dropped packets on the switch access ports my P4000's are connected to. The autoneg and speed are set correctly from what I can see. Flow control is enabled on those ports, but not on the trunk link between the two switches they're connected to. If the ports are trunked, make sure the switch and array are both configured correctly for the trunk. We had a ton of dropped packets on one array because the port channel on one of our switches wasn't configured and it caused the port to flap.
|
# ? Mar 8, 2012 04:45 |
|
I had a brain fart, why did I write trunk? I meant the 10G link between the 2 switches (all ESXi hosts and P4000 nodes are connected to both switches for redundancy). As for trunking/bonding, we used to use LACP for trunking the iSCSI ports between the switch and the P4000s (and let VMware handle NIC teaming on it's own), but that didn't sit well with them. We switched to using ALB which was a lot smoother, and then moved from having all iSCSI on one switch to split over two switches connected by a 10G link.
|
# ? Mar 8, 2012 11:23 |
|
Has anyone had any experience with using consumer SSDs in servers? Things started moving that way for me when HP was putting a 6 week hold on any server order with hard drives late last year and early this year due to the nastiness in Thailand. At first, I just ordered them with no HDD and booted from SAN, but for some things, it makes more sense to have DAS for your boot volume at least. I ordered a few Intel 520 series earlier this week to do some testing. I like the fact that Intel puts write statistics into their SMART status, so I should be able to keep an eye on write amplification and NAND wear. Other than this, my only SSDs in the data center are for a small but read heavy Oracle DB server. I put an array of 4 60GB Samsung SLC (HP OEM) drives in that, but they cost me like $800/each. I'm thinking that running an MLC drive that allows watching the statistics, and costs 75% less, would be a better use of the money. Even if I have to replace the consumer drives every year or two, I'd still be way ahead money wise vs the SLC or eMLC drives.
|
# ? Mar 8, 2012 17:46 |
|
As long as it's not in a critical server and won't cause any huge headache when it goes down, sure.
|
# ? Mar 8, 2012 17:51 |
|
I figured by doing a RAID1 and replacing them proactively as they get nearer to used up, I shouldn't have much downtime to worry about. I was thinking that at 50% life or so, replace it with a new drive, and put it in one of the Admin laptops or something.
|
# ? Mar 8, 2012 20:58 |
|
Internet Explorer posted:As long as it's not in a critical server and won't cause any huge headache when it goes down, sure.
|
# ? Mar 8, 2012 21:27 |
|
Just curious, do they fit in the 2.5" bays on DL-series servers?
|
# ? Mar 9, 2012 00:54 |
|
luminalflux posted:I'm seeing errors on the switch, they're showing up in the Tx Err column. Not sure if I can see errors on the P4000 easily (unless it's well hidden in CMC). Setting flow-control on the switch trunk didn't help. The ports are configured Auto on the switch and 1000/Auto on the SAN. Might try forcing them to fixed 1000 to see what happens. e: \/ \/ I only bring it up because "set it to 1000/full statically!" seems to be a troubleshooting step that either doesn't work or makes things even worse, at least in my experience. bort fucked around with this message at 13:14 on Mar 9, 2012 |
# ? Mar 9, 2012 01:47 |
|
bort posted:Assuming these are copper links, 1000Base-T is supposed to autonegotiate and not be set statically. It also uses all four pairs in the cable, not just two like earlier ethernet standards. If you're sure those counters are actually incrementing, I'd first suspect a cable. If you were having trouble with port aggregation and that's now fixed, those errors might be old and your counters might not be increasing. I'm well aware of autoneg and gigabit, previous gig was writing control software for ethernet network elements. What autoneg should mean vs what manufacturers think it could mean are a lot different. Counters are def increasing (I've got Munin tracking them), they took off a few weeks ago about when I upgraded SAN/iQ. Not sure if it's cable related, got good quality (hopefully) Cat6 patch cables and the cables have stayed the same throughout (and I haven't touched the hardware for a few months)
|
# ? Mar 9, 2012 09:18 |
|
KS posted:Just curious, do they fit in the 2.5" bays on DL-series servers? The standard 2.5" drive for a server has a 15mm Z, most consumer level SSDs are either 7mm or 9.5mm. If you're screwing into it from the bottom, you'll probably need to shim it up so the SATA connecter is in the right place, but if you're screwing it in from the sides, things *should* line up. The Intel 520s are mostly 7mm drives with a 2.5mm plastic shim attached, as far as I can tell. I ordered some of them earlier this week and they'll be going in a DL360, so I'll let you know my experience when they arrive.
|
# ? Mar 9, 2012 16:59 |
|
Anyone have any suggestions for an affordable (preferably sub-$3k without drives) 6+ bay rackmount NAS that supports NFS? Running PHDvirtual backups in my environment but the new version of the software apparently hates the CIFS share I am forced to use with our existing Drobo, and I need to replace it with something that natively supports NFS... and as cheaply as possible. Looked at a Drobo B800i but I'd rather not use iSCSI if possible.
|
# ? Mar 9, 2012 17:44 |
|
Our Synology DS1010+ has NFS. Haven't used the NFS features on it but we've been pretty happy with it overall. Ours isn't rack mounted but I think they sell rack mountable models.
|
# ? Mar 9, 2012 17:49 |
|
I was looking at the RS2211+ model, it seems like it might fit the bill. Has anyone here used the NFS features of a Synology NAS?
|
# ? Mar 9, 2012 18:00 |
|
Wonder_Bread posted:Has anyone here used the NFS features of a Synology NAS?
|
# ? Mar 9, 2012 22:30 |
|
FISHMANPET posted:Two vdevs (each RAIDZ set is a vdev) are faster than one vdev, even when it's the same pool. The way RAIDZ works, is that each vdev is as fast as its slowest member, because for every read it has to read off of every drive in the vdev. When you add two vdevs the data is striped across both (essential a RAID 0) so it can read twice as fast. Thanks!
|
# ? Mar 12, 2012 00:37 |
|
I'm using VMware's I/O Analyzer virt appliance (http://labs.vmware.com/flings/io-analyzer) do to some benchmarking on our new 12 disk NetApp FAS 2040. When I run Iometer on the datastore from two hosts, the total performance is less than when running Iometer from one host. Going in, I expected that total from two hosts would be higher than that of one host, but it's around 16% less for all tests. Here's something to illustrate my results: code:
Are the results I'm getting to be expected? If not, what could be the problem?
|
# ? Mar 12, 2012 22:51 |
|
Heads have to seek. Run your streams to 2 different disk aggregates if you want your numbers to go up.
|
# ? Mar 12, 2012 23:56 |
|
So... what I received is to be expected?
|
# ? Mar 13, 2012 14:45 |
|
FlyingZygote posted:So... what I received is to be expected? What type of disks? It actually sounds very high to me, even for 15k SAS drives. I imagine you're hitting cache somehow. http://en.wikipedia.org/wiki/IOPS
10,000 rpm SATA drives HDD ~125-150 IOPS[2] SATA 3 Gb/s 10,000 rpm SAS drives HDD ~140 IOPS [2] SAS 15,000 rpm SAS drives HDD ~175-210 IOPS [2] SAS 210 x 12 = 2520
|
# ? Mar 13, 2012 14:49 |
|
The numbers were fudged a bit to make them easier to look at. What I'm actually getting for MaxThroughput-100%Read: code:
|
# ? Mar 13, 2012 15:09 |
|
FlyingZygote posted:The numbers were fudged a bit to make them easier to look at. But but but why would you run the test again if you're seeing good performance, regardless of hitting the cache? Also typically iometer uses random data so it's not in the cache.
|
# ? Mar 13, 2012 15:19 |
|
Thin provisioning and dedupe are not performance hinderances on NetApp filers. Also, use sio from now.netapp.com, it's in the utility toolchest. This will give you the most accurate IO readings, across CIFS or NFS or iSCSI or FC etc. Those numbers you've got seem high and lead me to distrust the VMware tool you are using.
|
# ? Mar 13, 2012 17:19 |
|
NFS is thin by default (link). Since the numbers do seem high, I'll start some new benchmarks with sio (link). Thanks!
|
# ? Mar 13, 2012 18:03 |
|
madsushi posted:But but but why would you run the test again if you're seeing good performance, regardless of hitting the cache? Also typically iometer uses random data so it's not in the cache. Based on what he posted it looks like this was on the Max-Throughput, 100% read test which would make it sequential reads, which explains the high IOPS number since he's a) getting data from cache and b) only performing a minimal number of disk seeks, so the high seek latency on 7.2K disk is minimized. 3500 IOPs To the poster's original question regarding performance on his 2040, he should look at his the "sysstat -x" output on the filer while running his benchmarks if he wants to see his cache hit rate and how many high his actual disk utilization is. ZombieReagan posted:A FAS2040 isn't going to be up to handling multiple VMs pounding the poo poo out of it with a ton of random IO and not suffer some performance degradation. A FAS3210 wouldn't have been that much more expensive, and you have the option of getting PAM cards as well. Even if you didn't need them today, it's nice to have some cards to play that don't involve buying another controller when you're up against the wall, especially since adding more disks to a raid group or aggregate won't improve read performance until WAFL has time to balance things back out. PAM isn't supported on the 3210 on 8.1 and I'm not sure it ever will be, so buying 3210s with PAM is a bad idea for future supportability. I don't think sales is even supposed to provide the option. YOLOsubmarine fucked around with this message at 04:51 on Mar 14, 2012 |
# ? Mar 14, 2012 04:19 |
|
NippleFloss posted:PAM isn't supported on the 3210 on 8.1 and I'm not sure it ever will be, so buying 3210s with PAM is a bad idea for future supportability. I don't think sales is even supposed to provide the option. Yeah, this PAM and 3210 issue is annoying, there are many deployed before this configuration was disallowed. Just to be clear, PAM isn't supported on 3210's at all at the moment...
|
# ? Mar 14, 2012 07:09 |
|
marketingman posted:Yeah, this PAM and 3210 issue is annoying, there are many deployed before this configuration was disallowed. evil_bunnY fucked around with this message at 09:28 on Mar 14, 2012 |
# ? Mar 14, 2012 09:25 |
|
|
# ? May 21, 2024 15:00 |
|
Hey So we just moved to ESXi 5 and I am taking this oppertunity to redo all our data stores and LUNs, because they are horrible and messy. Anyone know any good resources for Array/LUN sizing with VMWare or any suggestions? We have a IBM DS3524 with 8SAS drives (2.8TB useable) and 8 SATA (4.5TB useable), with around 20 VMS on two hosts. On another note I was looking at the new storage DRS stuff and I think I poo poo myself with excitment Alctel fucked around with this message at 13:33 on Mar 14, 2012 |
# ? Mar 14, 2012 13:28 |