Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

grobbendonk: Apr 22, 2008

Where I work we're in the process of migrating from older EMC hardware to IBM SVC and V7000, with DS8800 for our mainframe and "crown jewel" applications. Whilst I've not exactly been wowed by the V7000 performance, or IBM's support, it's difficult to argue with the price difference to EMC. We're probably about 15-20% through the migration, and we've just had a new principal architect join the company and pretty much stated that V7000 and DS8 are crap and XIV is the only way to go. Whilst it would be easy to tell him to butt out of operational affairs, he's a pretty forceful character so I'd like to form my own opinion so we can have a meaningful conversation. We're going to do a proof of concept next month, and I'm sure I could ask IBM for a reference site to talk to, but that's pretty certain to be an IBM shill. I'm annoyed by this distraction, because the POC was meant to be about trying out the TMS solid state array with easy tier (any opinions on that too?)

I'd like to understand people's real world experience with IBM's XIV, what type of workload it's best for (heavy read, heavy write, OLTP etc), what are its shortcomings and other general opinions.

# ? Sep 27, 2013 07:20

Adbot: ADBOT LOVES YOU

# ? May 25, 2024 20:46

Pile Of Garbage: May 28, 2007

grobbendonk posted:

Where I work we're in the process of migrating from older EMC hardware to IBM SVC and V7000, with DS8800 for our mainframe and "crown jewel" applications. Whilst I've not exactly been wowed by the V7000 performance, or IBM's support, it's difficult to argue with the price difference to EMC. We're probably about 15-20% through the migration, and we've just had a new principal architect join the company and pretty much stated that V7000 and DS8 are crap and XIV is the only way to go. Whilst it would be easy to tell him to butt out of operational affairs, he's a pretty forceful character so I'd like to form my own opinion so we can have a meaningful conversation. We're going to do a proof of concept next month, and I'm sure I could ask IBM for a reference site to talk to, but that's pretty certain to be an IBM shill. I'm annoyed by this distraction, because the POC was meant to be about trying out the TMS solid state array with easy tier (any opinions on that too?)

I'd like to understand people's real world experience with IBM's XIV, what type of workload it's best for (heavy read, heavy write, OLTP etc), what are its shortcomings and other general opinions.

Are you running the 4x8Gb FC controllers or 4x10Gb Ethernet controllers on the V7000? Also I feel your pain re IBM support. I'm in Australia and normally their support for System-x/System Storage is excellent however apparently V7000 is grouped in with System-z so whenever you call through you end up in one of their overseas call centres and the guys out there can't do much beyond logging a case for you. I'd advise enabling the call-home feature in the V7000 as that will automatically get a case logged for you when poo poo hits the fan (It logs it straight to IBM Germany IIRC). Also leverage your IBM account manager whenever you can to expedite things. I posted about V7000 support way back in the thread: http://forums.somethingawful.com/showthread.php?threadid=2943669&userid=0&perpage=40&pagenumber=77#post404083868 (drat re-reading my earlier posts makes me want to get back into storage).

# ? Sep 27, 2013 21:07

paperchaseguy: Feb 21, 2002; THEY'RE GONNA SAY NO

I work for IBM, do you want to hear my opinion?

# ? Sep 27, 2013 22:15

grobbendonk: Apr 22, 2008

paperchaseguy posted:

I work for IBM, do you want to hear my opinion?

Of course, I'd be pleased to hear from anyone, especially if you can compare and contrast the V7000 and the XIV

cheese-cube posted:

Are you running the 4x8Gb FC controllers or 4x10Gb Ethernet controllers on the V7000? Also I feel your pain re IBM support. I'm in Australia and normally their support for System-x/System Storage is excellent however apparently V7000 is grouped in with System-z so whenever you call through you end up in one of their overseas call centres and the guys out there can't do much beyond logging a case for you. I'd advise enabling the call-home feature in the V7000 as that will automatically get a case logged for you when poo poo hits the fan (It logs it straight to IBM Germany IIRC). Also leverage your IBM account manager whenever you can to expedite things. I posted about V7000 support way back in the thread: http://forums.somethingawful.com/showthread.php?threadid=2943669&userid=0&perpage=40&pagenumber=77#post404083868 (drat re-reading my earlier posts makes me want to get back into storage).

we're running the 4x8Gb FC controllers, and we're just finishing the process of adding the extra Mondeo cards in each SVC to double the number of FC ports available. We're going to dedicate these links to replication as we run metro mirror to our DR site for pretty much all our data. We do use the dial-home facility and use our account manager, but he just doesn't seem to be able to influence the support people to the right level. We're not a small company and as we're based in the UK we have actually met the developers in Hursley and we have a client "advocate" from amongst those developers. With all this however, whenever we get something major happens it seems to need me to copy in about 20 people in IBM before I get the right level of attention.

On the flip side, I could empathise with the people complaining about EMC support a few months ago, but in the last couple of months it's been miles better

# ? Sep 28, 2013 01:49

Pile Of Garbage: May 28, 2007

grobbendonk posted:

we're running the 4x8Gb FC controllers, and we're just finishing the process of adding the extra Mondeo cards in each SVC to double the number of FC ports available. We're going to dedicate these links to replication as we run metro mirror to our DR site for pretty much all our data. We do use the dial-home facility and use our account manager, but he just doesn't seem to be able to influence the support people to the right level. We're not a small company and as we're based in the UK we have actually met the developers in Hursley and we have a client "advocate" from amongst those developers. With all this however, whenever we get something major happens it seems to need me to copy in about 20 people in IBM before I get the right level of attention.

Probably should have asked this before but in what specific area do you find the performance to be lacking? Also what link(s) do you have between your primary and DR sites and how are you planning on extending the fabric across these links (I'm assuming you'll be running separate storage and replication layers with the SVC)?

Re e-mailing IBM I've had many an e-mail chain where the header was almost larger than the body due to the extreme number of people CC'd. You wouldn't happen to have met a French guy who is one of their engineers/consultants? I can't remember his name but I've encountered him twice, once in person at a V7000 workshop and again on conference call at a SONAS workshop. I got into an argument with him re the correct path-selection policy for MPIO on ESX hosts (He said MRU, I said RR cos' that's what the Redbooks say :colbert:

# ? Sep 28, 2013 04:56

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

grobbendonk posted:

Of course, I'd be pleased to hear from anyone, especially if you can compare and contrast the V7000 and the XIV

we're running the 4x8Gb FC controllers, and we're just finishing the process of adding the extra Mondeo cards in each SVC to double the number of FC ports available. We're going to dedicate these links to replication as we run metro mirror to our DR site for pretty much all our data. We do use the dial-home facility and use our account manager, but he just doesn't seem to be able to influence the support people to the right level. We're not a small company and as we're based in the UK we have actually met the developers in Hursley and we have a client "advocate" from amongst those developers. With all this however, whenever we get something major happens it seems to need me to copy in about 20 people in IBM before I get the right level of attention.

On the flip side, I could empathise with the people complaining about EMC support a few months ago, but in the last couple of months it's been miles better

So we've been over this before in the thread, but what you want to do on an outage is call in the ticket over the phone, log it using the phrase "sev 1", and take down the ticket number. Immediately call back whatever the UK equivalent is of 1-800-IBM-SERV and interrupt the person asking for your information and ask to speak to the national duty manager. They will stay on the phone until someone has attention on your ticket.

Getting good answers on regular support issues can be a pain point, and I feel you on those.

# ? Sep 28, 2013 13:50

grobbendonk: Apr 22, 2008

cheese-cube posted:

Probably should have asked this before but in what specific area do you find the performance to be lacking? Also what link(s) do you have between your primary and DR sites and how are you planning on extending the fabric across these links (I'm assuming you'll be running separate storage and replication layers with the SVC)?

It not really lacking, but it just seems like the IBM gear is more prone to prolonged periods of high write response times under relatively light loads, for example we are observing periods of 15-20 minutes where the IOPS is around 3-5k but the write response time is around 30-50ms three or four times a day. This might be either a false positive because of our inexperience with TPC (which is pretty poor anyway) or because we're right in the migration cycle this stuff is under an intense microscope and we're just more sensitive / paranoid. We have 24 dark fibre links between sites which are only about 7 miles apart, so we've no issues there.

To be honest this wasn't meant to be a rant against IBM support but more of a query about XIV. I've worked as a storage administrator for 10 years now so I know pretty much all the P1/escalation tricks, I was just surprised by the wall of apathy I encountered with them this time round.

# ? Sep 28, 2013 22:12

Pile Of Garbage: May 28, 2007

I always got the impression that IBM aren't really sure where SVC/V7000 fits into the rest of their product line and they're having trouble consolidating their support.

Anyway, apologies for de-railing your question. Can anyone weigh in on grobbendonk's queries? -

grobbendonk posted:

Where I work we're in the process of migrating from older EMC hardware to IBM SVC and V7000, with DS8800 for our mainframe and "crown jewel" applications. Whilst I've not exactly been wowed by the V7000 performance, or IBM's support, it's difficult to argue with the price difference to EMC. We're probably about 15-20% through the migration, and we've just had a new principal architect join the company and pretty much stated that V7000 and DS8 are crap and XIV is the only way to go. Whilst it would be easy to tell him to butt out of operational affairs, he's a pretty forceful character so I'd like to form my own opinion so we can have a meaningful conversation. We're going to do a proof of concept next month, and I'm sure I could ask IBM for a reference site to talk to, but that's pretty certain to be an IBM shill. I'm annoyed by this distraction, because the POC was meant to be about trying out the TMS solid state array with easy tier (any opinions on that too?)

I'd like to understand people's real world experience with IBM's XIV, what type of workload it's best for (heavy read, heavy write, OLTP etc), what are its shortcomings and other general opinions.

# ? Sep 28, 2013 22:28

JockstrapManthrust: Apr 30, 2013

Anyone know why the in place 64bit aggr migration (without adding disks) is still an unsupported diag mode only option?

I could understand this when 8.1 was new, but we are on 8.1.3 and its still not supported.

I have an old 3140 with online (low access frequency) archive data on it that I would love to get to 64bit to use the cron compression mode with.

# ? Oct 1, 2013 12:25

goobernoodles: May 28, 2011; Wayne Leonard Kirby.

Orioles Magician.

Anyone ever use the ASM-VE (Auto Snapshot Manager) software that's packaged with Equalogic SANs for DR purposes?

# ? Oct 1, 2013 22:59

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

JockstrapManthrust posted:

Anyone know why the in place 64bit aggr migration (without adding disks) is still an unsupported diag mode only option?

I could understand this when 8.1 was new, but we are on 8.1.3 and its still not supported.

I have an old 3140 with online (low access frequency) archive data on it that I would love to get to 64bit to use the cron compression mode with.

I can't tell you why an in place upgrade is not supported without adding disks, though I suspect it has something to do with the metadata increase when moving to 64-bit and how losing that space is relatively less noticeable on larger aggregates and when adding storage. However it is very unlikely that it will ever be supported for normal customer use. If you very much need it you can talk to your account team about getting a PVR issued, as this is one of those cases where they would likely grant one. If you have additional space on a 64-bit aggregate somewhere you could VSM the volumes to that aggregate, blow away the 32-bit on and recreate it as a 64-bit and then snapmirror back.

# ? Oct 1, 2013 23:51

parid: Mar 18, 2004

I'm trying to do the full planning exercise for a couple of new NetApps to support a medium size (~7000 mailbox) exchange environment. My main goal for this process is to make sure all the proposed volumes fit into a reasonable aggregate configuration. So going through the work to eventually get at the usable space in the aggregates.

Not that it probably matters, but its two pairs (one per site) of FAS3220's with a 512GB flash cache card per controller. Clustered ONTAP 8.2 in a switchless configuration. Each pair is its own cluster.

So far I'm accounting for the following factors:

* Taking 3 drives out for a dedicated aggregate for the various vserver root volumes. (once per controller)
* RAID-DP parity drives
* At least 2 spares per disk type per controller
* Convert printed disk size to Base 2 (and check sum right sizing)
* Aggr Snapshot Reserve
* WAFL Overhead at 10%
* Keeping at least 15% free on the aggregates

Is there anything else that I might be missing?

# ? Oct 2, 2013 02:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

parid posted:

I'm trying to do the full planning exercise for a couple of new NetApps to support a medium size (~7000 mailbox) exchange environment. My main goal for this process is to make sure all the proposed volumes fit into a reasonable aggregate configuration. So going through the work to eventually get at the usable space in the aggregates.

Not that it probably matters, but its two pairs (one per site) of FAS3220's with a 512GB flash cache card per controller. Clustered ONTAP 8.2 in a switchless configuration. Each pair is its own cluster.

So far I'm accounting for the following factors:

* Taking 3 drives out for a dedicated aggregate for the various vserver root volumes. (once per controller)
* RAID-DP parity drives
* At least 2 spares per disk type per controller
* Convert printed disk size to Base 2 (and check sum right sizing)
* Aggr Snapshot Reserve
* WAFL Overhead at 10%
* Keeping at least 15% free on the aggregates

Is there anything else that I might be missing?

You can find the right-sized capacities of the different (newer) disk types here: https://library.netapp.com/ecmdocs/ECMP1196821/html/GUID-5D3F3F2D-A153-49D3-9858-04C30B3C7E74.html

Aggregate snapshot reserve defaults to 0 starting in 8.1 for non-mirrored aggregates, and in general you should have aggregate snapshots and snapshot reserve disabled unless you are using MC or syncmirror.

Otherwise, everything in there is correct.

# ? Oct 2, 2013 03:20

Pile Of Garbage: May 28, 2007

Does anyone have any experience with NetApp Virtual Storage Console for VMware vSphere? Last week I found that there was a scheduled "Optimisation and Migration" task that was running at 10:00PM every Monday and causing havoc across the entire infrastructure (Due to VMs being stunned during snapshot removal). We went ahead and excluded all datastores and disabled the schedule however last Monday it ran again and broke a ton of poo poo!

We spoke to NetApp who just shrugged their shoulders, told us to disable the service itself on the vCenter server and send them the logs. The problem is that there is not much we can do but wait till next Monday to see if it will run again and break everything.

Has anyone else encountered this?

# ? Oct 2, 2013 13:28

parid: Mar 18, 2004

NippleFloss posted:

You can find the right-sized capacities of the different (newer) disk types here: https://library.netapp.com/ecmdocs/ECMP1196821/html/GUID-5D3F3F2D-A153-49D3-9858-04C30B3C7E74.html

Aggregate snapshot reserve defaults to 0 starting in 8.1 for non-mirrored aggregates, and in general you should have aggregate snapshots and snapshot reserve disabled unless you are using MC or syncmirror.

Otherwise, everything in there is correct.

Thanks!

# ? Oct 2, 2013 16:15

Maneki Neko: Oct 27, 2000

cheese-cube posted:

Does anyone have any experience with NetApp Virtual Storage Console for VMware vSphere? Last week I found that there was a scheduled "Optimisation and Migration" task that was running at 10:00PM every Monday and causing havoc across the entire infrastructure (Due to VMs being stunned during snapshot removal). We went ahead and excluded all datastores and disabled the schedule however last Monday it ran again and broke a ton of poo poo!

We spoke to NetApp who just shrugged their shoulders, told us to disable the service itself on the vCenter server and send them the logs. The problem is that there is not much we can do but wait till next Monday to see if it will run again and break everything.

Has anyone else encountered this?

We haven't had that issue, we have had issues with VSC causing a bunch of crazy load on our cluster mode filers though, and making them sad. We don't do a ton of snapshotting though.

# ? Oct 2, 2013 17:19

madsushi: Apr 19, 2009; Baller.
#essereFerrari

The Optimization/Migration is only for VMs that might be misaligned (read: Windows 2003 or earlier). I would just turn it off altogether.

# ? Oct 2, 2013 18:06

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Maneki Neko posted:

We haven't had that issue, we have had issues with VSC causing a bunch of crazy load on our cluster mode filers though, and making them sad. We don't do a ton of snapshotting though.

What exactly is causing the load? The only things that VSC does on the filer other than some lightweight api queries to get filer information is create and delete volume snapshots for backups, and clone volumes or LUNs for the cloning workflow. There are some tasks that trigger VMWare snapshots (as cheesecube alluded to), but the problems with that are on the VMWare side and they shouldn't cause any undue load. I can't think of a single thing VSC does routinely that would cause any appreciable load on a filer other than creating or deleting a TON of snapshots all at once as part of a backup or backup deletion, and even then it would be abnormal for that to cause problems.

# ? Oct 2, 2013 18:10

El_Matarife: Sep 28, 2002

madsushi posted:

The Optimization/Migration is only for VMs that might be misaligned (read: Windows 2003 or earlier). I would just turn it off altogether.

So wait, Windows 2008+ are all good with alignment and stripe size even including SQL DBs? I thought making sure your SQL tables are aligned and the stripe sizes are right is a whole big thing.

SAN -> VMware VMFS -> Guest OS -> SQL DB are the layers I'm picturing that all need properly alignment and stripe size.

El_Matarife fucked around with this message at 23:09 on Oct 2, 2013

# ? Oct 2, 2013 22:59

Thrawn: Sep 10, 2004

theperminator posted:

Does anyone know how difficult it is to replace the Standby powersupply on an EMC AX4-5?

I've had an SPS failure and have to replace it, but I can't really find any documentation of the process other than a note that the SPS can be replaced without powering off the SAN

I've never dealt with an AX series array, but if they're at all similiar to the CX series, it's pretty straightforward. Pulling the SPS can be a bit of a pain because theyre typically at the bottom of the rack & the wiring/power cables/etc can get in the way, but that aside you just need to unplug the SPS, unscrew it, & pull it out. Just note that taking down the SPS will take the corresponding SP offline for the duration.

EMC claims that the SPS itself is not serviceable & the entire unit must be replaced, however, I've opened up 3 of them & replaced the batteries myself with zero issue. Of course, the initial one was on an EoL/EoS CX700, so I had nothing to lose. Since then though, I've done another one on a CX700 & most recently one on a CX3, both without incident. I may end up doing the second one on the CX3 here shortly...I had to move that array last week & it was complaining about the SPS batteries when I powered it back up.

# ? Oct 2, 2013 23:05

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

El_Matarife posted:

So wait, Windows 2008+ are all good with alignment and stripe size even including SQL DBs? I thought making sure your SQL tables are aligned and the stripe sizes are right is a whole big thing.

SAN -> VMware VMFS -> Guest OS -> SQL DB are the layers I'm picturing that all need properly alignment and stripe size.

Native 2008 disks (meaning not upgraded from 2003) are aligned by default for the 4k block size that wafl uses. If you provision the right lun type then vmfs will also be aligned properly. If all of that is properly aligned then sql IO will be aligned as well due to the transaction sizes it uses being multiples of 4k.

# ? Oct 2, 2013 23:21

Maneki Neko: Oct 27, 2000

NippleFloss posted:

What exactly is causing the load? The only things that VSC does on the filer other than some lightweight api queries to get filer information is create and delete volume snapshots for backups, and clone volumes or LUNs for the cloning workflow. There are some tasks that trigger VMWare snapshots (as cheesecube alluded to), but the problems with that are on the VMWare side and they shouldn't cause any undue load. I can't think of a single thing VSC does routinely that would cause any appreciable load on a filer other than creating or deleting a TON of snapshots all at once as part of a backup or backup deletion, and even then it would be abnormal for that to cause problems.

It was another engineer in our company chasing it down, but from the bits and pieces I saw it appeared to be a on ontap bug where VSC discovery was causing issues with some of the cluster functionality. If

I recall correctly you work for NetApp, if you're interested in poking more and can look up details based on the case number, I'd be happy to send it to you. We ended up making it go away with an ontap upgrade.

# ? Oct 3, 2013 06:05

theperminator: Sep 16, 2009; by Smythe; Fun Shoe

Thrawn posted:

I've never dealt with an AX series array, but if they're at all similiar to the CX series, it's pretty straightforward. Pulling the SPS can be a bit of a pain because theyre typically at the bottom of the rack & the wiring/power cables/etc can get in the way, but that aside you just need to unplug the SPS, unscrew it, & pull it out. Just note that taking down the SPS will take the corresponding SP offline for the duration.

Awesome, thanks. it should be pretty easy then.

Thankfully it's still covered by warranty so I've already got the replacement SPS, and our utilization is low enough to be handled by one SP while the work is being performed so hopefully it will all go smoothly.

Will I need to trespass the luns back after it's complete? or will it do that itself?

# ? Oct 3, 2013 08:31

Amandyke: Nov 27, 2004; A wha?

Thrawn posted:

Just note that taking down the SPS will take the corresponding SP offline for the duration.

This is not correct. The SP *will* stay up during the replacement. SPS replacements are fully non-disruptive. Just unscrew the screws on the faceplate and the rear of the SPS, power off the SPS via the power switch, unplug the power cables, slide out the old SPS, slide in the new SPS and re-do the cables/screws.

If you're covered under maintenance still, you can have EMC send out a CE to do the replacement if you're not comfortable performing the replacement.

# ? Oct 3, 2013 16:19

Thrawn: Sep 10, 2004

Amandyke posted:

This is not correct. The SP *will* stay up during the replacement.

Maybe I'm misunderstanding, but on the arrays I have dealt with the SP is directly powered by the corresponding SPS (as in, the power cable for SPA/SPB is plugged directly into the SPS, not the rack PDU), & when I flip the power switch on the SPS, the SP goes dark. Is the SP still technically up in some sense, being powered through the other SPS or something?

# ? Oct 3, 2013 19:43

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

parid posted:

* Taking 3 drives out for a dedicated aggregate for the various vserver root volumes. (once per controller)
* RAID-DP parity drives
* At least 2 spares per disk type per controller
* Convert printed disk size to Base 2 (and check sum right sizing)
* Aggr Snapshot Reserve
* WAFL Overhead at 10%
* Keeping at least 15% free on the aggregates

Don't you also need to keep at least 10% free space per LUN (if you are using LUNs)?

This exact list was the first reason why I was so mad at NetApp. We got a FAS2050 with 20x300GB drives and after all of the above best practices were applied we ended up with something less than 40% usable space of the theoretical 6TB.

# ? Oct 4, 2013 00:00

theperminator: Sep 16, 2009; by Smythe; Fun Shoe

Thrawn posted:

Maybe I'm misunderstanding, but on the arrays I have dealt with the SP is directly powered by the corresponding SPS (as in, the power cable for SPA/SPB is plugged directly into the SPS, not the rack PDU), & when I flip the power switch on the SPS, the SP goes dark. Is the SP still technically up in some sense, being powered through the other SPS or something?

The information I got from Dell ProSupport was that the SP would be powered off in the process.

I should be fine to replace it and will be doing so tonight.

# ? Oct 4, 2013 03:54

parid: Mar 18, 2004

Agrikk posted:

Don't you also need to keep at least 10% free space per LUN (if you are using LUNs)?

This exact list was the first reason why I was so mad at NetApp. We got a FAS2050 with 20x300GB drives and after all of the above best practices were applied we ended up with something less than 40% usable space of the theoretical 6TB.

You want some free space overhead in the volume, but you don't need that much. I have a bunch of oracle luns on dedicated volumes with upper 90% usage.

NetApps inability to accurately set usable space expectations with their customers on sales seems to be pretty common. I can't even get my sales team to do it with specific requests. I think the challenge is that aggregate setup and sizing is an implementation detail and can have a large effect on what you get.

# ? Oct 4, 2013 04:18

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Agrikk posted:

Don't you also need to keep at least 10% free space per LUN (if you are using LUNs)?

This exact list was the first reason why I was so mad at NetApp. We got a FAS2050 with 20x300GB drives and after all of the above best practices were applied we ended up with something less than 40% usable space of the theoretical 6TB.

There's need to maintain a certain amount of free space in volumes. Once upon a time there was a recommendation to maintain enough space in a volume for each LUN to be completely overwritten once (fractional reserve), which was there to prevent LUNs from going offline if a volume filled up with snapshot data. But that isn't a recommendation anymore. There are no performance reasons to maintain a certain level of free space within volumes, other than dedupe requiring a very small amount of free space (<2%) to function properly.

As far as how much usable space you get out of NetApp versus other vendors, it's a problem that is greatly exacerbated on systems with a relatively small number of disks. Running in an active/active mode means that each controller requires spares and each controller requires at least one dedicated raid group so you immediately lose 4 disks to parity and another 2, minimum, to spares. Active/Passive arrays don't have this problem but the trade off is that you have half of your compute power sitting idle. But when you lose six disks immediately that's very noticeable when you only have a dozen or two.

The other reserves are pretty tolerable. The 10% WAFL reserve is used for metadata and to maintain empty stripes to perform overwrites. It's a necessary adjunct the the way WAFL does writes, which also enables free, very efficient snapshots. You lose some disk space to checksums, but those checksums are a really important data integrity feature that protects against some fairly insidious types of data loss. Keeping your aggregates under 85% utilized is a good rule of thumb, but it's really dependent on what your workload looks like. Low overwrite workloads can subsist fine at higher levels of utilization. This is also something that affects every storage vendor to some extent or another. If you only have a few places left to write new data then it is impossible to optimize disk accesses for those writes to minimize latency.

A lot of sales teams do a pretty terrible job of explaining how much usable space will be available, where this reserved space goes and what you get in return. When you break down where each bit of storage goes and what features that enables it's an easier sale. It's also important to remember that you lose relatively less and less space as you add more disk because you only pay the spare tax once, and even medium sized systems get around 75% usable to raw space. Raid-DP is also a high performance filesystem that performs as well as Raid-10 from other vendors where you see 50% utilization or less by definition.

If you got less than 40% usable space out of your 20 disk 2050 then I would question the aggregate setup. Even on a relatively small system like that you should be at around 60% as usable space.

# ? Oct 4, 2013 05:49

Amandyke: Nov 27, 2004; A wha?

Thrawn posted:

Maybe I'm misunderstanding, but on the arrays I have dealt with the SP is directly powered by the corresponding SPS (as in, the power cable for SPA/SPB is plugged directly into the SPS, not the rack PDU), & when I flip the power switch on the SPS, the SP goes dark. Is the SP still technically up in some sense, being powered through the other SPS or something?

The power supply for the peer SP should keep both SP's running as power is shared over the backplane. Unless I'm missing something drastic in the design of the ax 4 as compared to any other clariion.

# ? Oct 4, 2013 18:42

Agrikk: Oct 17, 2003; Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

NippleFloss posted:

If you got less than 40% usable space out of your 20 disk 2050 then I would question the aggregate setup. Even on a relatively small system like that you should be at around 60% as usable space.

Here's the math from my setup:

Advertised space: 20 disks x 300GB = 6TB

(Granted, no one takes 300GB seriously as the actual capacity of the disk, but our sales rep kept tossing out 6TB as the raw storage capacity, so that's what I'm going with)

Actual Space:

20 disks minus 4 parity disks and 2 spares = 14 disks
Formatted space: 268GB

3752GB useable

Then subtract 15% for aggregate overhead = 3189GB useable
Subtract 10% for WAFL reserve = 2870GB useable

Then subtract 10% for free LUN space that I was told had to exist to maintain top performance

2583GB useable

2583 / 6000 = .43

So yeah. 43% useable.

I can spot you the 10% for free lun space, but I wish I'd known about the huge overhead involved in a single FAS2050 deployment beforehand because I ended up having problems with disk space too soon after its purchase.

# ? Oct 4, 2013 22:33

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Agrikk posted:

Here's the math from my setup:

Advertised space: 20 disks x 300GB = 6TB

(Granted, no one takes 300GB seriously as the actual capacity of the disk, but our sales rep kept tossing out 6TB as the raw storage capacity, so that's what I'm going with)

Actual Space:

20 disks minus 4 parity disks and 2 spares = 14 disks
Formatted space: 268GB

3752GB useable

Then subtract 15% for aggregate overhead = 3189GB useable
Subtract 10% for WAFL reserve = 2870GB useable

Then subtract 10% for free LUN space that I was told had to exist to maintain top performance

2583GB useable

2583 / 6000 = .43

So yeah. 43% useable.

I can spot you the 10% for free lun space, but I wish I'd known about the huge overhead involved in a single FAS2050 deployment beforehand because I ended up having problems with disk space too soon after its purchase.

Are you randomly converting between base-2 (TiB) and base-10 (TB) in the middle of your calculations?

# ? Oct 5, 2013 00:52

MrMoo: Sep 14, 2000

What's the performance like on those 20 disks? What's the price like compared to using SSD and a smaller chassis?

# ? Oct 6, 2013 02:14

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

MrMoo posted:

What's the performance like on those 20 disks? What's the price like compared to using SSD and a smaller chassis?

Basically an enterprise SSD is rated for 10k IOPS with sum MS latency, a 15k enterprise drive is usually 150-190 IOPS with seek latency.

If we assume 175 IOPS per 300GB HDD @ 15K it would take at least 10 HDD's - RAID level, to equal 1 SSD in IOPS performance.

# ? Oct 6, 2013 02:23

evil_bunnY: Apr 2, 2003

Until you consider throughput.

# ? Oct 6, 2013 15:03

MrMoo: Sep 14, 2000

From what I can tell 3 x spinning disks = 1 x SSD for bandwidth.

http://www.sgidepot.co.uk/diskdata.html
http://www.guru3d.com/articles_pagesprinter/corsair_force_gs_240gb_ssd_review,15.html

It's a flip of a coin depending on applications.

# ? Oct 7, 2013 01:13

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Anytime someone says "IOPS" I just assume the worst (4K) and remember that 10,000 IOPS @ 4K is only 40 MB/s or a 320 Mbps which is pretty awful bandwidth.

# ? Oct 7, 2013 02:18

theperminator: Sep 16, 2009; by Smythe; Fun Shoe

Amandyke posted:

The power supply for the peer SP should keep both SP's running as power is shared over the backplane. Unless I'm missing something drastic in the design of the ax 4 as compared to any other clariion.

Yeah there's no shared backplane on the AX4, just two SP's with their own Serial/Power/Ethernet it's probably a pretty budget unit compared to what a lot of you guys are running.

Can't wait until we turf this thing next year.

# ? Oct 8, 2013 01:55

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

So, in the last two weeks my old IBM DS4800/DS5100 storage arrays have:

Randomly deleted a host port entry and reassociated a WWN with a host port on a different server, causing LUN conflicts that resulted in a 30-minute panic/reboot loop on our principal ERP server
Corrupted an entire LUN during a rebuild that happened for no apparent goddamn reason
Stopped servicing I/O to the same ERP hostgroup for 60 seconds while retrieving logs from the unit, causing another panic/reboot
For no loving reason whatsoever, forgotten default gateway settings on both controllers, causing management ports to become unavailable

So loving glad to be done with this poo poo.

Vulture Culture fucked around with this message at 03:20 on Oct 8, 2013

# ? Oct 8, 2013 03:06

Adbot: ADBOT LOVES YOU

# ? May 25, 2024 20:46

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Just installed 500TB of Isilon today, looking forward to see what it can do in the coming weeks.

# ? Oct 8, 2013 03:10

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »