Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

Zorak of Michigan: Jun 10, 2006

We're looking at doing a big storage refresh, since management noticed that the money we're spending on all our VMAXes exceeds the limits of reasons. EMC is telling us how wonderful VNX2 will be, NetApp is singing the praises of cluster mode OnTap, and I missed the Dell presentation but apparently it boiled down to "nearly as good and much cheaper." Anyone have any words of love or hate about those products? I should mention that my background is all in UNIX, not storage, but at the end of 2012 they moved me into what should be a cross-functional architect role and it would make me seem very clever if I turned up out of the blue with valuable storage information. My limited experience prejudices me away from EMC (I used them back when we connected to Clarions via SCSI and they never got anything right the first time) and a little toward NetApp (because WAFL gives their stuff a very solid technical foundation).

# ¿ Jul 29, 2013 16:07

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 03:29

Zorak of Michigan: Jun 10, 2006

madsushi posted:

Can you share some information on total size, growth rate/%, IOPS needs, and protocols (NFS/iSCSI/FC/etc)?

We're at about 3 petabytes total storage and 120k IOPS if I remember the briefing I got from our storage guys correctly. We're looking at annual growth of 5-10% in capacity and probably closer to 5% in terms of IOPS. We're currently almost all FC but we want to look at FCoE to keep the plant expenses down. We're a telco so availability trumps all and we're looking at Dell, NetApp, and EMC not because we've prescreened their product offerings for suitability but because we have history with all three and know they can handle our needs. If there are other vendors we should talk to, I'm all ears (though it may be hard to get them in the door) but we need someone who's going to have solid 24x7 support, 4 hour dispatch, parts depots, etc. We want dedup and snapshots a lot; ideally we want to move to keeping ~14 days of snapshots around, doing backup and restore entirely via snaps, and reserving tape for data that has firm requirements for > 14 day retention. We're spending a fortune on Networker right now and the storage experts think that if we moved to using tape only for longer term retention, we could ditch Networker for something much simpler and cheaper.

Anyway, my major concern, and the area I was thinking Goon expertise would help, is that salesmen always say they can do it, but they're usually lying about something. If anyone has been bitten by problems with Dell, EMC, or NetApp buys in the last couple years, I'd love to hear your horror stories. I'll also read back in the thread because I'm not completely antisocial.

Edit: We used to do a lot of business with IBM. A couple years ago they tried to play hardball in negotiations over licensing and support, so our CIO issued an edict that nothing from IBM should be considered if there were competing products available that would fill the need. We pay them for our AS/400 environment and may buy some dedicated AS400 storage from them, but for our SPARC and x86 storage needs, they won't be allowed in the door.

Zorak of Michigan fucked around with this message at 19:32 on Jul 29, 2013

# ¿ Jul 29, 2013 19:28

Zorak of Michigan: Jun 10, 2006

evil_bunnY posted:

Find out, and also find out the IO size and temporal pattern breakdown.

This shouldn't mean you shouldn't get a quote to use against the others.

What do you mean by temporal pattern breakdown? If you just mean by time of day, it came up in conversation and we're busy during business hours, dropping off quite a bit in the early evening, then very read heavy after midnight as backups start.

# ¿ Jul 29, 2013 22:53

Zorak of Michigan: Jun 10, 2006

I don't currently even have access to that sort of detail. We're bringing in EMC to do a deep dive on our current VMAX setup to help us pin down every little usage detail and come up with exact metrics for sizing, to which the vendors under discussion so far will happily size their offerings. I trust our storage guys on that front. If there are no big red flags for the vendor offerings, I'll just wait for the figures and proposals to come in and see what people think (plus or minus any inconvenient disclosure problems).

# ¿ Jul 30, 2013 02:31

Zorak of Michigan: Jun 10, 2006

Linux Nazi posted:

New toys arrived. Fully outfitted 5700s and frame-licensed vplex. x2 of course.

Still waiting on the bump up from 1gb to 10gb interconnects between our datacenters (which is essentially just updating the bandwidth statement), but we start going over the vplex design tomorrow with EMC to whiteboard and see how we need to cable everything up.

Should be interesting. The idea is a VMWare stretched / metro cluster. We are 99% virtualized, and we already have layer 2 spanning courtesy of OTV. With vplex taking care of the storage side, we can essentially put one datacenter's ESXi hosts into maintenance mode and go to lunch while we wait for things to gracefully vmotion to the other side of town.

Right now we are all RecoverPoint and SRM, it works pretty well, but failovers are a huge event.

I'd love to hear a progress report as you go through your implementation and test phases. EMC talked up VPlex to us and briefly had us all fantasizing about how cool it would be. We even had data center management speculating about whether our search for a new DR site should be limited to being within synchronous replication range of the primary DC just so we could do active:active VPlex clustering. It took them about five minutes to get back to reality, which is unusually long for an audience that normally stays very grounded in cost:benefit analysis.

# ¿ Jul 30, 2013 18:26

Zorak of Michigan: Jun 10, 2006

Due to a series of mistakes we found ourselves in a situation where we had to get ready for a major acquisition without the opportunity to properly analyze and measure the incoming environment. It was a sufficiently big deal that we opted to just buy way too much of everything. Better to waste money than to tell the board that we were unable to integrate the new company's systems. I was only working the UNIX side at that time but when I look at the current utilization figures for the servers I specified, I feel shame. I wasted a lot of money. On the other hand, we absorbed the new company and never ran short of UNIX compute, so I gave them exactly what they asked for.

Now we're out of that rather ridiculous situation and able to properly instrument things. Unfortunately we're also able to take a hard look at revenue and costs. We can't keep paying for storage at current rates. The idea of using just a small Vmax tier is interesting. I'm not sure management would bite unless we go for a storage virtualization solution. We won't want to take an outage to move someone's LUN on or off of the VMAX tier.

# ¿ Aug 2, 2013 06:11

Zorak of Michigan: Jun 10, 2006

Amandyke posted:

Just to confirm, you guys are looking at VMAXe/VMAX10k's as well right? I'll jump on the tiering bandwagon as well. How much of that 3PB needs to be available within miliseconds, and what might you be able to throw into an EDL (or something similar)?

Since we're already a VMAX shop, we're going to have EMC deep-dive the current VMAXes and come up with a proposal to deliver similar performance & capacity + several year's growth. In the presentation they gave us, they were mostly talking up the VNX2 rather than the VMAXes, which I took to be their tacit admission that they couldn't sell us a VMAX at a price we wanted to afford, but we'll certainly see.

We have SATA in our existing FastVP pools, so the EMC experts ought to be able to pull some reporting data about the distribution of IO. We definitely have to put sub-FC performance tiers behind automated tiering, though, because every director has some reason why their application must have the best possible IO performance. Making the application teams take some responsibility for these costs will be an important part of moving from our current "holy poo poo we need a big box of storage" model to something designed for cost efficiency.

# ¿ Aug 2, 2013 15:09

Zorak of Michigan: Jun 10, 2006

Vanilla posted:

Still look at VPLEX then. On it's own it does things such as storage virtualisation....means in the future if you do want to turn on the fancy geosynchronous replication, active/active stuff it's just a license turn on.

However you may find that any savings you make on going for lower tiers of storage get eaten up by VPLEX being introduced

My argument, when we discussed the VPlex internally, was that we needed to treat storage virtualization separately from the discussion of capacity, IOPS, etc. I suggested a separate conversation to nail down our use cases for storage virtualization, assess products that met those use cases, and see if their value is proportionate to the expense. There may be a better way to have that conversation but given that cluster-mode OnTap also offers some of the same features, I think we need the use cases to avoid interminable discussions of how cool various features are, or will be in the next release, and so on, forever. I also suggested that if know we want storage virtualization, we should open the conversation up to vendors like DataCore.

# ¿ Aug 2, 2013 21:12

Zorak of Michigan: Jun 10, 2006

Is there a cheap option for reliable and replicate-able nearline storage? One of my coworkers is pushing us toward Amazon Glacier for archival storage. I'm wondering if there are on-prem solutions that might come close to that same price target. We're mostly an EMC shop now and while I know VNX2 pricing has gotten pretty good, I haven't seen them anywhere near a penny per gig per month over the life of the array.

# ¿ Nov 21, 2013 19:32

Zorak of Michigan: Jun 10, 2006

evol262 posted:

It'd be nice if it were Ceph/Swift/Gluster/whatever, but those filesystems are almost always on top of mdraid or something anyway, since configuring so you don't actually treat node1.disk1 as a possible redundancy for node1.disk2 is a PITA. Maybe they'll improve this on the software level first.

I'm not even a Ceph noob but I thought the entire point of it was that as long as your config accurately describes disks/servers/rows/data centers, it will automatically spread your copies as wide as it can. Are the docs misleading?

# ¿ Feb 18, 2014 06:56

Zorak of Michigan: Jun 10, 2006

Does anyone have any experience with Scality's RING software defined storage solution? They're giving one of my coworkers a pretty aggressive sales push.

# ¿ Apr 23, 2015 16:23

Zorak of Michigan: Jun 10, 2006

Do any of you have interesting experiences to report with Oracle storage? I'm looking at a hardware refresh on our data warehouse platform, which uses Oracle. If the claims they make for the efficiency of their hybrid columnar compression are remotely accurate, we're going to want to at least look at their storage options, but I can't help wondering if that's something we'd regret in a year.

Zorak of Michigan fucked around with this message at 02:29 on Oct 20, 2015

# ¿ Oct 19, 2015 15:11

Zorak of Michigan: Jun 10, 2006

Does anyone have ceph experience? I've been interested it in for a while now but it's not something anyone at work would get interested in unless there was native VMware or Windows support. I'm pondering assuaging my curiosity and my need for more NAS space by setting up a very small Ceph cluster and Ceph->NFS gateway in my basement. I know it wouldn't be close to enterprise standards of redundancy, since it would have just a single monitor running in a kvm guest, but is there any reason I couldn't do it?

Zorak of Michigan fucked around with this message at 23:39 on Feb 1, 2016

# ¿ Feb 1, 2016 22:57

Zorak of Michigan: Jun 10, 2006

Scuttlemonkey posted:

As far as whether or not to do it, without knowing your skill level, I'd say jump in with both feet on setting up a Ceph cluster and throwing some data at it. However, unless you are prepared to really dig in and do your homework beforehand, I'd suggest caution on how much you rely on it until you are comfortable. There are HUGE number of ways to tune (read: screw-up-performance), balance (put your cluster in a damaged state), or use a Ceph cluster. A familiarity with the distributed storage paradigm, and sometimes Ceph in particular, is often required to really get out of the gate without a few false starts.

Thanks for the feedback! My skill level is weird because I've been a UNIX guy for 20 years now but my role gives me very limited hands-on experience. I'm effectively a tier 3 guy for weird performance problems but I've never actually loaded a Linux system from bare metal. Back in the 1990s I was an AFS admin but I haven't done distributed storage since then. The good news is that my performance needs are trivial by modern standards (support a max of 3 concurrent HD video streams through the Ceph->NFS gateway box) and I can afford some false starts since I'll keep the first ~5TB of data live on other systems for a while. I'm thinking that I'll scale out to two data servers with just 2 data disks each and make sure they're stable and then begin stacking them up.

Question I'm pondering as I design this scheme - would I better off trying to use the Ceph file system or a Ceph block device? If I read the docs right, file system means that I need metadata servers, and I'm not sure if it would be kosher to put them in the same KVM as my monitor daemons. On the other hand, file system implies that if I experience data loss, it will be localized to specific files, whereas data loss in objects making up a block device could mean the entire block device is trashed.

# ¿ Feb 4, 2016 19:34

Zorak of Michigan: Jun 10, 2006

I've been super happy with our Pure arrays. Our data reduction ratios are pretty good (though they're thrown off weird poo poo like huge swap partitions created to provide DISM backing store for Solaris nodes running Oracle 11, which are guaranteed never to see a byte of use) and more importantly, support has been great.

# ¿ May 22, 2018 03:54

Zorak of Michigan: Jun 10, 2006

YOLOsubmarine posted:

The interesting thing about Flashblade is that it�s not terrible on small block/object random IO. You wouldn�t necessarily run a bunch of VMs off of it but you can use it to store an OLTP database and be happy with the results. That�s definitely not true with Isilon.

That's actually depressing. I would have expected FlashBlade to be good for VMs, just based on my experience with Pure as a vendor. Do you know what slows it down too much for VM storage?

# ¿ Jun 20, 2018 16:15

Zorak of Michigan: Jun 10, 2006

I didn't speak up because we don't have Nimble to compare it to, but we're a Pure customer and I love the stuff. Not only do the arrays work, but their support staff have been really easy to work with when we hit them with vague "how do I do x" questions.

# ¿ Nov 15, 2018 22:45

Zorak of Michigan: Jun 10, 2006

The only weakness we've found with our Pures is in workloads that need massive throughput more than low latency, like data warehouse stuff. The Pures don't suck at throughput, but they don't do much better that wide spinning arrays that cost a fraction as much.

# ¿ Nov 16, 2018 03:13

Zorak of Michigan: Jun 10, 2006

I am a big Pure fan and I every time I talk to our Pure rep, I ask why they aren't going after the NetApp market by selling me one array that can do block via FC, block via iSCSI, NFS, and SMB. He talked up FlashBlade a lot before it came out, but when it released without SMB support, I just sagged in my chair and began muttering negative thoughts.

# ¿ May 8, 2019 15:53

Zorak of Michigan: Jun 10, 2006

Get that compression ratio guaranteed in writing if you buy from them. They assured us our Dell-EMC array would deliver 4:1 and it got the same 2:1-ish that every other array gets.

I am a happy Pure customer.

# ¿ Aug 10, 2021 03:14

Zorak of Michigan: Jun 10, 2006

They have a strategy? I thought they just had a lot of well-intentioned reps calling people at random. Seriously, I have a Dell-EMC sales team that I have a functioning relationship with, and yet I keep getting called by Dell-EMC randos wanting to sell me storage, and I cannot understand it.

# ¿ Aug 10, 2021 19:33

Zorak of Michigan: Jun 10, 2006

There is one thing I'd like that Pure and Nimble don't sell, and that's a good vendor for low-IOPS high-capacity storage, preferably one that offers block and file in the same chassis. Other than Isilon, which doesn't do block, I haven't been really happy with any spinning storage I've bought in years. It's not like I hate it, it just feels antiquated compared to the ease of managing Pure.

# ¿ Aug 11, 2021 17:09

Zorak of Michigan: Jun 10, 2006

Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?

# ¿ Dec 6, 2023 16:43

Adbot: ADBOT LOVES YOU

# ¿ Apr 28, 2024 03:29

Zorak of Michigan: Jun 10, 2006

Low IOPS but high throughput is the use case for Flash Blade. I'd at least ask Pure for a quote. We have a lot of FlashArrays and Pure has been great to us.

# ¿ Dec 15, 2023 00:49

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?