Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

FISHMANPET posted:

I have no idea, and I'm guessing the researchers don't either. Everybody here's been trained to think of storage purely in terms of space, no other concern is given. I'm also only tangentially involved so I don't have any power to say "nope this is dumb."

I'm sure I'll hear about the quote that comes back from our reseller and just cry a bunch and move on with my life.

Just tell them you want to order this
http://www.synology.com/products/spec.php?product_name=RS10613xs%2B&lang=us#p_submenu
and some of these
http://www.synology.com/products/product.php?product_name=RS3412xs&lang=us

Be sure to use 2TB Green drives to save a tree!

Dilbert As FUCK fucked around with this message at 16:20 on Jul 17, 2013

# ? Jul 17, 2013 16:17

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 05:40

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

Dilbert As gently caress posted:

Just tell them you want to order this
http://www.synology.com/products/spec.php?product_name=RS10613xs%2B&lang=us#p_submenu
and two of these
http://www.synology.com/products/product.php?product_name=RS3412xs&lang=us

Be sure to use 2TB Green drives to save a tree!

shutupshutupshutup

Another group wanted a handful of Terabytes for their compute cluster, so we specced them one of the Dell servers that can hold 10 or 12 3.5" hard drives, the total cost was about $4k if I recall.

The grad student came back and literally said "I can buy a 2TB drive at Best Buy for $100 why is this so expensive"

Even Synology was more expensive than the Dell solution we cooked up for them.

E:
Found the ticket and the exact quote

quote:

> Actually, this is a bit more expensive than I thought. As a 3TB hard
> disk is around $350, I thought the total price will be only slightly
> higher than the storage. Especially that we do not care much about
> memory or processing.

# ? Jul 17, 2013 16:20

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

FISHMANPET posted:

shutupshutupshutup

Another group wanted a handful of Terabytes for their compute cluster, so we specced them one of the Dell servers that can hold 10 or 12 3.5" hard drives, the total cost was about $4k if I recall.

The grad student came back and literally said "I can buy a 2TB drive at Best Buy for $100 why is this so expensive"

Even Synology was more expensive than the Dell solution we cooked up for them.

E:
Found the ticket and the exact quote

I'll provide the PI with the information they need to make the right decision, but at some point, you can't keep making it your problem. If they want to buy a dozen external hard drives and juggle them between computers and lose their life's work because of it, let them. The organization can support you (and them) by providing an adequate technology budget, or not.

That doesn't mean you stop trying to get the organization to understand why what they're doing is a problem, but sometimes there's no changing the mind of the guy who only budgeted $3000 for storage on his grant.

Vulture Culture fucked around with this message at 20:42 on Jul 17, 2013

# ? Jul 17, 2013 20:40

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

FISHMANPET posted:

The grad student came back and literally said "I can buy a 2TB drive at Best Buy for $100 why is this so expensive"

The correct response to this is always "Cool, sounds like you've got a solution to your problem and I can close this ticket out."

# ? Jul 18, 2013 05:21

Docjowles: Apr 9, 2009

NippleFloss posted:

The correct response to this is always "Cool, sounds like you've got a solution to your problem and I can close this ticket out."

Usually I don't condone being a dick but this would be REALLY loving tempting in this case

# ? Jul 18, 2013 05:32

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

Misogynist posted:

I'll provide the PI with the information they need to make the right decision, but at some point, you can't keep making it your problem. If they want to buy a dozen external hard drives and juggle them between computers and lose their life's work because of it, let them. The organization can support you (and them) by providing an adequate technology budget, or not.

That doesn't mean you stop trying to get the organization to understand why what they're doing is a problem, but sometimes there's no changing the mind of the guy who only budgeted $3000 for storage on his grant.

Unfortnunetly somebody doesn't have the spine to be able to hit a point and say "nope, this is dumb, we've done all we can."

If somebody bought a bunch of USB hard drives and plugged them in and they failed it would still somehow be our fault.

# ? Jul 18, 2013 05:41

Stugazi: Mar 1, 2004; Who me, Bitter?

FISHMANPET posted:

The grad student came back and literally said "I can buy a 2TB drive at Best Buy for $100 why is this so expensive"

Exact same discussion I had with a client last week. The client has a new member of management who tries to contribute to meetings but knows nothing about technology. He mostly just slows us down and causes more work/FUD.

He thinks he's clever when he points out things like this and all I want to say is "STFU, I've been doing this for 20 years and you sound like an idiot".

Sadly, the other management team doesn't always understand either so I'm stuck explaining why 20TB is 200x more expensive than a 2TB drive from Best Buy in a way people who can barely open their email will understand. :negative:

# ? Jul 18, 2013 06:21

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

When someone suggests that you buy a 2TB drive at best buy for $150 bucks to store your organizational data you should also suggest that you buy a 400 dollar e-machine desktop to run your critical services too. I mean, why spend $10,000 on a server when it's just a computer and computers are cheap?

# ? Jul 19, 2013 00:13

Thanks Ants: May 21, 2004; #essereFerrari

NippleFloss posted:

When someone suggests that you buy a 2TB drive at best buy for $150 bucks to store your organizational data you should also suggest that you buy a 400 dollar e-machine desktop to run your critical services too. I mean, why spend $10,000 on a server when it's just a computer and computers are cheap?

This is when you get given poo poo for wasting $9600 every time you've ordered a server.

# ? Jul 19, 2013 00:25

Amandyke: Nov 27, 2004; A wha?

Caged posted:

This is when you get given poo poo for wasting $9600 every time you've ordered a server.

Especially when that e-machine is never obsolete. $400 you never have to spend again!

# ? Jul 19, 2013 02:33

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Amandyke posted:

Especially when that e-machine is never obsolete. $400 you never have to spend again!

"AOL included, but join the eMachines Network instead if you want this thing the sticker says"

# ? Jul 19, 2013 02:39

Thanks Ants: May 21, 2004; #essereFerrari

Holy gently caress, the stickers

# ? Jul 19, 2013 11:24

Beelzebubba9: Feb 24, 2004

keygen and kel posted:

I'm trying to work out a SAN storage agreement with another organization, and i'm not quite sure what the typical usable amount of a SAN is.

The SAN is a Nimble 260 so there's 36 TB raw, which they show as 25-50 TB usable the > 25 TB part is based on compression so I'll say 25 TB usable storage, what I don't know is how much typically gets used up by snapshots and whatever else is needed.

What's a reasonable amount of that 25 TB can be used by VM's?

I've got a Nimble CS460 in production and we see about a ~45% compression rate on our VMware datastores and our nightly snapshots with a 15 day retention takes up about 10% of the total data on disk. Obviously, YMMV, but those are just rough numbers from our environment. Also, with a hot spare that 25 TB is 22 TB.

# ? Jul 19, 2013 23:27

Zorak of Michigan: Jun 10, 2006

We're looking at doing a big storage refresh, since management noticed that the money we're spending on all our VMAXes exceeds the limits of reasons. EMC is telling us how wonderful VNX2 will be, NetApp is singing the praises of cluster mode OnTap, and I missed the Dell presentation but apparently it boiled down to "nearly as good and much cheaper." Anyone have any words of love or hate about those products? I should mention that my background is all in UNIX, not storage, but at the end of 2012 they moved me into what should be a cross-functional architect role and it would make me seem very clever if I turned up out of the blue with valuable storage information. My limited experience prejudices me away from EMC (I used them back when we connected to Clarions via SCSI and they never got anything right the first time) and a little toward NetApp (because WAFL gives their stuff a very solid technical foundation).

# ? Jul 29, 2013 16:07

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Zorak of Michigan posted:

We're looking at doing a big storage refresh, since management noticed that the money we're spending on all our VMAXes exceeds the limits of reasons. EMC is telling us how wonderful VNX2 will be, NetApp is singing the praises of cluster mode OnTap, and I missed the Dell presentation but apparently it boiled down to "nearly as good and much cheaper." Anyone have any words of love or hate about those products? I should mention that my background is all in UNIX, not storage, but at the end of 2012 they moved me into what should be a cross-functional architect role and it would make me seem very clever if I turned up out of the blue with valuable storage information. My limited experience prejudices me away from EMC (I used them back when we connected to Clarions via SCSI and they never got anything right the first time) and a little toward NetApp (because WAFL gives their stuff a very solid technical foundation).

Can you share some information on total size, growth rate/%, IOPS needs, and protocols (NFS/iSCSI/FC/etc)?

# ? Jul 29, 2013 16:37

evil_bunnY: Apr 2, 2003

Metrics. All the time every time. The thing with the storage providers you've given us is that they offer kits that are pretty different. Also don't forget IBM lest misogynist come in here and throw a loving fit.

# ? Jul 29, 2013 16:49

mooky: Jan 14, 2012

I'm currently leasing servers at a popular data center and have been rsyncing my data backups over the WAN to a Netgear ReadyNAS at an offsite location. I'm looking to colo a backup solution at the data center and am looking for some advice. Is there an affordable 1U NAS device that holds 4 disks and will support rsyncing of data from linux servers? It won't serve any other purpose than long term data storage, essentially a backup of a backup. It will have a 1gbps connection to any device that it is backing up. iSCSI and NFS are nice but not required. If the unit is ~$1k, that would be ideal. I plan to use the Western Digital Red drives if that matters.

Is there an opinion of Netgear ReadyNAS vs Synology Rackstation? I've used both professionally and installed in client's offices. I've never really done any performance testing on either but they are both fairly straight forward to setup and install and seem to do what they promise. QNAP is another option, I think TS-469U-SP would be my choice if I went with QNAP.

So my options as I see it are:

Netgear ReadyNAS 2120
Synology RackStation RS812 or RS812+
QNAP TS-469U-SP

mooky fucked around with this message at 18:53 on Jul 29, 2013

# ? Jul 29, 2013 18:41

Zorak of Michigan: Jun 10, 2006

madsushi posted:

Can you share some information on total size, growth rate/%, IOPS needs, and protocols (NFS/iSCSI/FC/etc)?

We're at about 3 petabytes total storage and 120k IOPS if I remember the briefing I got from our storage guys correctly. We're looking at annual growth of 5-10% in capacity and probably closer to 5% in terms of IOPS. We're currently almost all FC but we want to look at FCoE to keep the plant expenses down. We're a telco so availability trumps all and we're looking at Dell, NetApp, and EMC not because we've prescreened their product offerings for suitability but because we have history with all three and know they can handle our needs. If there are other vendors we should talk to, I'm all ears (though it may be hard to get them in the door) but we need someone who's going to have solid 24x7 support, 4 hour dispatch, parts depots, etc. We want dedup and snapshots a lot; ideally we want to move to keeping ~14 days of snapshots around, doing backup and restore entirely via snaps, and reserving tape for data that has firm requirements for > 14 day retention. We're spending a fortune on Networker right now and the storage experts think that if we moved to using tape only for longer term retention, we could ditch Networker for something much simpler and cheaper.

Anyway, my major concern, and the area I was thinking Goon expertise would help, is that salesmen always say they can do it, but they're usually lying about something. If anyone has been bitten by problems with Dell, EMC, or NetApp buys in the last couple years, I'd love to hear your horror stories. I'll also read back in the thread because I'm not completely antisocial.

Edit: We used to do a lot of business with IBM. A couple years ago they tried to play hardball in negotiations over licensing and support, so our CIO issued an edict that nothing from IBM should be considered if there were competing products available that would fill the need. We pay them for our AS/400 environment and may buy some dedicated AS400 storage from them, but for our SPARC and x86 storage needs, they won't be allowed in the door.

Zorak of Michigan fucked around with this message at 19:32 on Jul 29, 2013

# ? Jul 29, 2013 19:28

evil_bunnY: Apr 2, 2003

Zorak of Michigan posted:

We're at about 3 petabytes total storage and 120k IOPS if I remember the briefing I got from our storage guys correctly.

Find out, and also find out the IO size and temporal pattern breakdown.

Zorak of Michigan posted:

Edit: We used to do a lot of business with IBM. A couple years ago they tried to play hardball in negotiations over licensing and support, so our CIO issued an edict that nothing from IBM should be considered if there were competing products available that would fill the need. We pay them for our AS/400 environment and may buy some dedicated AS400 storage from them, but for our SPARC and x86 storage needs, they won't be allowed in the door.

This shouldn't mean you shouldn't get a quote to use against the others.

evil_bunnY fucked around with this message at 22:37 on Jul 29, 2013

# ? Jul 29, 2013 22:35

Zorak of Michigan: Jun 10, 2006

evil_bunnY posted:

Find out, and also find out the IO size and temporal pattern breakdown.

This shouldn't mean you shouldn't get a quote to use against the others.

What do you mean by temporal pattern breakdown? If you just mean by time of day, it came up in conversation and we're busy during business hours, dropping off quite a bit in the early evening, then very read heavy after midnight as backups start.

# ? Jul 29, 2013 22:53

evil_bunnY: Apr 2, 2003

Zorak of Michigan posted:

What do you mean by temporal pattern breakdown? If you just mean by time of day, it came up in conversation and we're busy during business hours, dropping off quite a bit in the early evening, then very read heavy after midnight as backups start.

Right so is 120k your max? How much of the day is spent around max? How much performance/capacity do you grow by every period, and do you foresee a change in the rate?

# ? Jul 29, 2013 23:04

Zorak of Michigan: Jun 10, 2006

I don't currently even have access to that sort of detail. We're bringing in EMC to do a deep dive on our current VMAX setup to help us pin down every little usage detail and come up with exact metrics for sizing, to which the vendors under discussion so far will happily size their offerings. I trust our storage guys on that front. If there are no big red flags for the vendor offerings, I'll just wait for the figures and proposals to come in and see what people think (plus or minus any inconvenient disclosure problems).

# ? Jul 30, 2013 02:31

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

The VNX2 is brand new. Like, I don't know that they've even delivered any to customers. Ask for a reference from a customer running VNX2, preferably a customer similar in size to your own company. New hardware/software lines often has some teething pains, even the good stuff, and doing a forklift replacement of your IT infrastructure is probably not the best time to be a beta tester. I also don't believe that dedupe will be available at launch, so if that's a consideration it is something that you should ask about. The VNX has also historically had performance problems with snapshots enabled. No telling if the new line will be better because no one has gotten their hands on one to test it as far as I know.

In my personal and biased opinion Clustered ONTAP is pretty neat. Management is much improved over 7-mode, especially if you like the CLI. The flash based caching architecture that works at a 4k block level (no waste due to page sizes much larger than the IO size) coupled with WAFL's built-in write acceleration means that you can get pretty good performance out of relatively cheap and relatively few disks. Thin provisioning, snapshots, deduplication, and raid-DP are all meant to be used in production. We benchmark with that stuff enabled, which is important for showing that the feature isn't just a way to check a box on a list. With proper configuration it is very possible to go diskless for backup using only snapshots and snapmirror/vault replication. I have a customer that maintains 14 daily snapshots and 12 weekly snapshots of all of their data on NetApp, for a total of 3 months worth of on disk retention. The total backup footprint is about 180TB on 1.22PB of user data. That includes SQL and Oracle databases, VMWare datastores, CIFS shares, GeoSpatial Imaging data, SharePoint farms, Solaris zone roots, and all sorts of other types of data so it's not just a subset of data the is especially well suited for block level snapshots. I've been dealing a lot with CDOT lately so if you have any questions about it feel free to shoot me a PM.

At 3PB you're probably outside of the range of Nimble, who get a lot of love on here and seem to have a good product. They won't do FC or FCoE though, so I guess they would be out anyway. Hitachi's high end arrays are very, very, very stable, but they're also expensive as hell so that will probably put you back into VMAX territory. Compellant or 3Par fit your requirements as well, but I don't know very much about them. I think we have some goons here who have used them who might be able to chime in.

Just make sure whoever you talk to is selling to you based on your actual requirements. If a vendor tells you that they can do x number of IOPs with no problem put them on the spot by asking detailed questions. Ask them what block size, what's the ratio of reads to writes, ratio of random to sequential IO, and what the latency of those operations would be. "Our <blank> array will do 1 million IOPs!" is utterly useless marketing speak. Put them on the spot if they say something like that and make them back it up with real numbers, preferably tailored to your environment. You probably aren't running 3 TB of the same 4k block so proving that you can read the same 4k block off of flash based cache 1 million times a second isn't a very useful benchmark, but there are definitely vendors that will set up a "benchmark" and do just that. Make sure they understand your workload and requirements and explain how their solution meets them, and provides the ability to grow to match your environment over time. Also make sure they show you the management tools that you would actually be using, the way that you would actually be using them. Tell them "we do X task regularly here" and I would like to see what it would take to accomplish that with your product. Or "we would like to do x task here, show me how I would do that with your product." Management is often overlooked because it doesn't fit easily into marketing bullet points like "Hybrid Flash Array!" or "Automated Storage Tiering!" but the best hardware in the world can be undermined by terrible management software that makes it difficult to use the hardware to it's potential.

# ? Jul 30, 2013 04:36

Blame Pyrrhus: May 6, 2003; Me reaping: Well this fucking sucks. What the fuck.; Pillbug

New toys arrived. Fully outfitted 5700s and frame-licensed vplex. x2 of course.

Still waiting on the bump up from 1gb to 10gb interconnects between our datacenters (which is essentially just updating the bandwidth statement), but we start going over the vplex design tomorrow with EMC to whiteboard and see how we need to cable everything up.

Should be interesting. The idea is a VMWare stretched / metro cluster. We are 99% virtualized, and we already have layer 2 spanning courtesy of OTV. With vplex taking care of the storage side, we can essentially put one datacenter's ESXi hosts into maintenance mode and go to lunch while we wait for things to gracefully vmotion to the other side of town.

Right now we are all RecoverPoint and SRM, it works pretty well, but failovers are a huge event.

# ? Jul 30, 2013 05:03

Wicaeed: Feb 8, 2005

Linux Nazi posted:

New toys arrived. Fully outfitted 5700s and frame-licensed vplex. x2 of course.

Still waiting on the bump up from 1gb to 10gb interconnects between our datacenters (which is essentially just updating the bandwidth statement), but we start going over the vplex design tomorrow with EMC to whiteboard and see how we need to cable everything up.

Should be interesting. The idea is a VMWare stretched / metro cluster. We are 99% virtualized, and we already have layer 2 spanning courtesy of OTV. With vplex taking care of the storage side, we can essentially put one datacenter's ESXi hosts into maintenance mode and go to lunch while we wait for things to gracefully vmotion to the other side of town.

Right now we are all RecoverPoint and SRM, it works pretty well, but failovers are a huge event.

How do you get a job doing something like that? I'm DYING to get into Virtualization/Data storage, however the company I work for doesn't really have any plans to really do anything like that.

I'm fairly familiar with VMware, but not so much on the storage side.

# ? Jul 30, 2013 09:19

parid: Mar 18, 2004

Linux Nazi posted:

Should be interesting. The idea is a VMWare stretched / metro cluster. We are 99% virtualized, and we already have layer 2 spanning courtesy of OTV. With vplex taking care of the storage side, we can essentially put one datacenter's ESXi hosts into maintenance mode and go to lunch while we wait for things to gracefully vmotion to the other side of town.

Right now we are all RecoverPoint and SRM, it works pretty well, but failovers are a huge event.

Have you run a metrocluster before? VMware's development of metrocluster related functions is a bit lacking.

The biggest issue we have had with ours is capacity planning. Traditional vmware cluster capacity planning tools don't have a way to simulate a site failure. All of our sizing is done as a "what if" worst-case scenario. What if we lost Site A? How which VMs would be left? How much would they need? How big does site B need to be? Since it isn't perfectly balanced, the answer for each site will be different.

Right now, I essentially manage this with physical:virtual cpu ratios in a spreadsheet. It's messy. I'm the only one who can understand it. There is a lot of manual information manipulation. I only update it a couple times a year due to the hassle it is to work with it. We don't even have that large of an environment, only ~700 vms. Scaling this any larger would quickly make my manual processes impossible. I have been working with VMware for a whole year on this and they essentially have no answer. Our VAR is lost as well. I have spoken to every capacity management vendor that has come along, and they all don't have a way to deal with this.

Do you know how your going to deal with capacity planning in your MetroCluster?

# ? Jul 30, 2013 17:14

Zorak of Michigan: Jun 10, 2006

Linux Nazi posted:

New toys arrived. Fully outfitted 5700s and frame-licensed vplex. x2 of course.

Still waiting on the bump up from 1gb to 10gb interconnects between our datacenters (which is essentially just updating the bandwidth statement), but we start going over the vplex design tomorrow with EMC to whiteboard and see how we need to cable everything up.

Should be interesting. The idea is a VMWare stretched / metro cluster. We are 99% virtualized, and we already have layer 2 spanning courtesy of OTV. With vplex taking care of the storage side, we can essentially put one datacenter's ESXi hosts into maintenance mode and go to lunch while we wait for things to gracefully vmotion to the other side of town.

Right now we are all RecoverPoint and SRM, it works pretty well, but failovers are a huge event.

I'd love to hear a progress report as you go through your implementation and test phases. EMC talked up VPlex to us and briefly had us all fantasizing about how cool it would be. We even had data center management speculating about whether our search for a new DR site should be limited to being within synchronous replication range of the primary DC just so we could do active:active VPlex clustering. It took them about five minutes to get back to reality, which is unusually long for an audience that normally stays very grounded in cost:benefit analysis.

# ? Jul 30, 2013 18:26

tehfeer: Jan 15, 2004; Do they speak english in WHAT?

quote:

Option 1 is:
HP MSA 2040 SAN DC SFF STORAGE
HP 8/20q Fibre Channel Switch
And assorted cables, HBA's, SFP's etc.

At this point I would highly recommend staying away from the HP MSA. We have had so many issues with our HP MSA. I have gone through a total of 15 controllers and a shelf. The web interface is horribly unstable it only comes up half the time the other half I have to restart one of the controllers through the command line. Sometimes it shows some completely different UI for some non HP company. 98% of the company I work for runs on the MSA.

At this point I am making sure I have really good backups and praying to the storage gods that this POS lasts until next year.

# ? Jul 30, 2013 18:31

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

parid posted:

Have you run a metrocluster before? VMware's development of metrocluster related functions is a bit lacking.

The biggest issue we have had with ours is capacity planning. Traditional vmware cluster capacity planning tools don't have a way to simulate a site failure. All of our sizing is done as a "what if" worst-case scenario. What if we lost Site A? How which VMs would be left? How much would they need? How big does site B need to be? Since it isn't perfectly balanced, the answer for each site will be different.

Right now, I essentially manage this with physical:virtual cpu ratios in a spreadsheet. It's messy. I'm the only one who can understand it. There is a lot of manual information manipulation. I only update it a couple times a year due to the hassle it is to work with it. We don't even have that large of an environment, only ~700 vms. Scaling this any larger would quickly make my manual processes impossible. I have been working with VMware for a whole year on this and they essentially have no answer. Our VAR is lost as well. I have spoken to every capacity management vendor that has come along, and they all don't have a way to deal with this.

Do you know how your going to deal with capacity planning in your MetroCluster?

You're doing this manually? We get all this information through PowerCLI in a few dozen lines.

# ? Jul 30, 2013 18:34

parid: Mar 18, 2004

Misogynist posted:

You're doing this manually? We get all this information through PowerCLI in a few dozen lines.

The bulk of configured data I get from powercli as well. The two biggest problems for that are:

1. How do you get the host affinity rules and attribute them to which hosts are in which site? That's that part that's stopped me from automating more of the process.

2. This means you are doing all your capacity planning based on configured resources, not actual resource use. Its better than nothing, but supporting 20 pegged 1 vcpu machines takes a different amount of host resources than 20 idle 1 vpcu machines.

# ? Jul 30, 2013 21:28

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Err read what you were responding to my bad

Dilbert As FUCK fucked around with this message at 22:07 on Jul 30, 2013

# ? Jul 30, 2013 21:47

Blame Pyrrhus: May 6, 2003; Me reaping: Well this fucking sucks. What the fuck.; Pillbug

parid posted:

Have you run a metrocluster before? VMware's development of metrocluster related functions is a bit lacking.

The biggest issue we have had with ours is capacity planning. Traditional vmware cluster capacity planning tools don't have a way to simulate a site failure. All of our sizing is done as a "what if" worst-case scenario. What if we lost Site A? How which VMs would be left? How much would they need? How big does site B need to be? Since it isn't perfectly balanced, the answer for each site will be different.

Right now, I essentially manage this with physical:virtual cpu ratios in a spreadsheet. It's messy. I'm the only one who can understand it. There is a lot of manual information manipulation. I only update it a couple times a year due to the hassle it is to work with it. We don't even have that large of an environment, only ~700 vms. Scaling this any larger would quickly make my manual processes impossible. I have been working with VMware for a whole year on this and they essentially have no answer. Our VAR is lost as well. I have spoken to every capacity management vendor that has come along, and they all don't have a way to deal with this.

Do you know how your going to deal with capacity planning in your MetroCluster?

I'm not even sure what we are doing is technically a "metro-cluster". It's a single vsphere cluster spanned across our 2 DCs, since OTV allows us to align the VLANs and vplex allows us to give visibility and redundancy to storage, we can just vmotion between hosts, and VMWare doesn't even know or care that the host is 40 miles away. It is not as if we are needing to configure it in any specific way.

Fortunately we are almost entirely like for like across all of our hosts (4 chassis of Gen 8 BL460s w/ 256GB). We do not over-commit anything, and we only thick provision our VMDKs. We only bother with resource pools in our SDLC environment. So we are pretty sloppy with our compute. We are only maybe 40% utilized on that front. Blades are cheap so we would much rather just purchase more than risk a 3AM on-call page.

With storage, again we don't allow over-commitment in our pools, and with vplex we will be presenting the exact same LUN in both places, so we will see the ceiling as it approaches well before it becomes an issue.

Zorak of Michigan posted:

I'd love to hear a progress report as you go through your implementation and test phases. EMC talked up VPlex to us and briefly had us all fantasizing about how cool it would be. We even had data center management speculating about whether our search for a new DR site should be limited to being within synchronous replication range of the primary DC just so we could do active:active VPlex clustering. It took them about five minutes to get back to reality, which is unusually long for an audience that normally stays very grounded in cost:benefit analysis.

I can definitely do that. We literally just got out of our first meeting with EMC.

Blame Pyrrhus fucked around with this message at 02:06 on Jul 31, 2013

# ? Jul 30, 2013 22:05

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

parid posted:

The bulk of configured data I get from powercli as well. The two biggest problems for that are:

1. How do you get the host affinity rules and attribute them to which hosts are in which site? That's that part that's stopped me from automating more of the process.

We actually skirt around this problem by looking at the datastore the host sits on instead. They're all named for the device hosting the primary replica.

parid posted:

2. This means you are doing all your capacity planning based on configured resources, not actual resource use. Its better than nothing, but supporting 20 pegged 1 vcpu machines takes a different amount of host resources than 20 idle 1 vpcu machines.

There's nothing special about capacity planning a stretched cluster versus multiple individual vSphere clusters in this regard. Honestly, I don't think this is different from capacity planning a single local cluster, or a single host. You need to keep an eye on your environment and have an idea of what your workloads look like. There's lots of ways to do this; we happen to pump our data into Graphite and look at a handful of different dashboards that give us a high-level overview of what the systems in the cluster are doing over time. You can actually pull the performance data rather easily using PowerCLI -- the only tricky part is getting it in a performant way if you want very granular per-VM stats for a pile of VMs. We gave up on that because getting all the metrics we wanted took about 5 minutes per script run, and now we mostly just look at hosts.

We generally don't care about the CPU utilization of individual VMs unless we notice application performance suffering somewhere along the line. We prefer to keep an eye on the overall cluster utilization and let DRS do what it does to keep that number meaningful.

# ? Jul 31, 2013 01:52

parid: Mar 18, 2004

That's an interesting idea. Don't think about the VMs, just look a level up at the hosts and monitor their data more closely.

I think the challenge my environment has is VMs are configured individually for site affinity rules. They may or may not move between sites. It's not perfectly balanced either so the scenario changes depending on which side fails. It challenging to know what resources would be necessary to support what would move.

This is a problem most people probably solve by throwing money at it. It wouldn't even be that much money. Right now, no one is getting capital improvement funds unless they can justify an emergency. I'm spending most of my time trying to find ways to do more with less and trying to predict when we will be "out" in the cluster.

# ? Jul 31, 2013 16:52

Vanilla: Feb 24, 2002; Hay guys what's going on in th

Zorak of Michigan posted:

We're at about 3 petabytes total storage and 120k IOPS if I remember the briefing I got from our storage guys correctly. We're looking at annual growth of 5-10% in capacity and probably closer to 5% in terms of IOPS. We're currently almost all FC but we want to look at FCoE to keep the plant expenses down. We're a telco so availability trumps all and we're looking at Dell, NetApp, and EMC not because we've prescreened their product offerings for suitability but because we have history with all three and know they can handle our needs. If there are other vendors we should talk to, I'm all ears (though it may be hard to get them in the door) but we need someone who's going to have solid 24x7 support, 4 hour dispatch, parts depots, etc. We want dedup and snapshots a lot; ideally we want to move to keeping ~14 days of snapshots around, doing backup and restore entirely via snaps, and reserving tape for data that has firm requirements for > 14 day retention. We're spending a fortune on Networker right now and the storage experts think that if we moved to using tape only for longer term retention, we could ditch Networker for something much simpler and cheaper.

I used to work for EMC and to be honest if availability is key then VMAX is it. EMC pretty much set the bar with regards to availability and support on the Symmetrix/DMX/VMAX range. Next in line for that level f availability is the HDS high-end offering in my opinion.

Sure the VMAX is hella expensive but I bet you've had a pretty good ride with them?

That being said 3PB is a lot. Maybe just do a bit of data classification and work out the top percent that really needs the availability ad put the rest of the other storage?

# ? Aug 1, 2013 20:33

Zorak of Michigan: Jun 10, 2006

Due to a series of mistakes we found ourselves in a situation where we had to get ready for a major acquisition without the opportunity to properly analyze and measure the incoming environment. It was a sufficiently big deal that we opted to just buy way too much of everything. Better to waste money than to tell the board that we were unable to integrate the new company's systems. I was only working the UNIX side at that time but when I look at the current utilization figures for the servers I specified, I feel shame. I wasted a lot of money. On the other hand, we absorbed the new company and never ran short of UNIX compute, so I gave them exactly what they asked for.

Now we're out of that rather ridiculous situation and able to properly instrument things. Unfortunately we're also able to take a hard look at revenue and costs. We can't keep paying for storage at current rates. The idea of using just a small Vmax tier is interesting. I'm not sure management would bite unless we go for a storage virtualization solution. We won't want to take an outage to move someone's LUN on or off of the VMAX tier.

# ? Aug 2, 2013 06:11

Amandyke: Nov 27, 2004; A wha?

Zorak of Michigan posted:

Due to a series of mistakes we found ourselves in a situation where we had to get ready for a major acquisition without the opportunity to properly analyze and measure the incoming environment. It was a sufficiently big deal that we opted to just buy way too much of everything. Better to waste money than to tell the board that we were unable to integrate the new company's systems. I was only working the UNIX side at that time but when I look at the current utilization figures for the servers I specified, I feel shame. I wasted a lot of money. On the other hand, we absorbed the new company and never ran short of UNIX compute, so I gave them exactly what they asked for.

Now we're out of that rather ridiculous situation and able to properly instrument things. Unfortunately we're also able to take a hard look at revenue and costs. We can't keep paying for storage at current rates. The idea of using just a small Vmax tier is interesting. I'm not sure management would bite unless we go for a storage virtualization solution. We won't want to take an outage to move someone's LUN on or off of the VMAX tier.

Just to confirm, you guys are looking at VMAXe/VMAX10k's as well right? I'll jump on the tiering bandwagon as well. How much of that 3PB needs to be available within miliseconds, and what might you be able to throw into an EDL (or something similar)?

# ? Aug 2, 2013 07:16

Zorak of Michigan: Jun 10, 2006

Amandyke posted:

Just to confirm, you guys are looking at VMAXe/VMAX10k's as well right? I'll jump on the tiering bandwagon as well. How much of that 3PB needs to be available within miliseconds, and what might you be able to throw into an EDL (or something similar)?

Since we're already a VMAX shop, we're going to have EMC deep-dive the current VMAXes and come up with a proposal to deliver similar performance & capacity + several year's growth. In the presentation they gave us, they were mostly talking up the VNX2 rather than the VMAXes, which I took to be their tacit admission that they couldn't sell us a VMAX at a price we wanted to afford, but we'll certainly see.

We have SATA in our existing FastVP pools, so the EMC experts ought to be able to pull some reporting data about the distribution of IO. We definitely have to put sub-FC performance tiers behind automated tiering, though, because every director has some reason why their application must have the best possible IO performance. Making the application teams take some responsibility for these costs will be an important part of moving from our current "holy poo poo we need a big box of storage" model to something designed for cost efficiency.

# ? Aug 2, 2013 15:09

Vanilla: Feb 24, 2002; Hay guys what's going on in th

Zorak of Michigan posted:

Now we're out of that rather ridiculous situation and able to properly instrument things. Unfortunately we're also able to take a hard look at revenue and costs. We can't keep paying for storage at current rates. The idea of using just a small Vmax tier is interesting. I'm not sure management would bite unless we go for a storage virtualization solution. We won't want to take an outage to move someone's LUN on or off of the VMAX tier.

Still look at VPLEX then. On it's own it does things such as storage virtualisation....means in the future if you do want to turn on the fancy geosynchronous replication, active/active stuff it's just a license turn on.

However you may find that any savings you make on going for lower tiers of storage get eaten up by VPLEX being introduced

# ? Aug 2, 2013 17:41

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 05:40

Motronic: Nov 6, 2009

tehfeer posted:

At this point I would highly recommend staying away from the HP MSA. We have had so many issues with our HP MSA. I have gone through a total of 15 controllers and a shelf. The web interface is horribly unstable it only comes up half the time the other half I have to restart one of the controllers through the command line. Sometimes it shows some completely different UI for some non HP company.

Yep. The red interface. I forget the name. You get that when it boots off of the other partition on the flash card.

MSAs are really, really lovely. I'll never buy one again. The last one started out with a simultaneous dual power supply failure not but 2 months after it went into production, followed by making GBS threads a controller 6 months later, and then the other controller 2 weeks after the warranty was up.

# ? Aug 2, 2013 17:44

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »