Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

OnceIWasAnOstrich: Jul 22, 2006

evil_bunnY posted:

We want disk/flash not tape 🥶

Well, if flash is what you want... We don't really use the object side in favor of the fancy NFS, but the many-more petabytes we have on flash in Vast has been relatively smooth. A big improvement over whatever IBM is calling GPFS these days.

# ? Nov 1, 2023 00:28

Adbot: ADBOT LOVES YOU

# ? Apr 28, 2024 13:20

Maneki Neko: Oct 27, 2000

evil_bunnY posted:

Do any of you run on-prem object stores (�1PB) you're happy with?

We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian.

# ? Nov 2, 2023 19:11

kzersatz: Oct 13, 2012; How's it the kiss of death, if I have no lips?; College Slice

Kaddish posted:

Welp, just bought a NetApp c250. I haven't used Ontap in like....10 years. Looks like there's been a few changes! Learning about LUN configuration and what a SVM even is like a little baby.

Edit - Looks like I need to manually enable space_alloc if I'm going to thin provision, thanks admin guide!

Any Ontap gurus here have any tips for an Ontap newbie? This will be 90% Windows client general use.

Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it

# ? Nov 28, 2023 06:01

kzersatz: Oct 13, 2012; How's it the kiss of death, if I have no lips?; College Slice

Maneki Neko posted:

We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian.

Netapp storage grid.
It's pricy to start, but it is very reliable

# ? Nov 28, 2023 06:01

evil_bunnY: Apr 2, 2003

kzersatz posted:

Netapp storage grid.
It's pricy to start, but it is very reliable

Oh hey we're talking to a partner org with decent install of this. Anything of note?

Vulture Culture posted:

Depends how manual the rest of your process is. If you're zoning out manually provisioned LUNs to initiators/HBAs, the experience isn't going to differ a ton between vendors. If you're doing dynamic provisioning from a Kubernetes or OpenStack cluster, there's a lot of variation in CSI/Cinder drivers, their quality, and how tightly coupled they are to the underlying storage topology

Because this is suddenly relevant here, which ones are best?

# ? Nov 28, 2023 12:22

Kaddish: Feb 7, 2002

kzersatz posted:

Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it

My little 100TiB C250 is chugging along nicely. Things have (obviously) changed a lot since the N-Series days.

Unrelated, I'll be going to Spectrum Protect (TSM) training and taking over all backup administration. Not looking forward to it.

We're probably going to transition to something like Veam over the next 3-5 years. In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.

# ? Nov 28, 2023 14:15

evil_bunnY: Apr 2, 2003

Kaddish posted:

In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.

LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy :eng99:

# ? Nov 28, 2023 16:15

Thanks Ants: May 21, 2004; #essereFerrari

Surely you mean :black101:

# ? Nov 28, 2023 16:56

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

Because this is suddenly relevant here, which ones are best?

My personal experience is limited on the K8s front. I found that the Trident operator for NetApp ONTAP had a great feature set and the performance was off the charts for our use cases in early testing, there's a huge user base compared to basically every other storage vendor, but the CSI drivers presented some stability problems I didn't see using AWS's native storage options. Gluster is basically dead. Ceph's integrations work well, but you're stuck managing Ceph, and I'm a little bit anxious about the recent move out from Red Hat into IBM to be closer to the other IBM storage teams. Traditional vendors' (Dell, HPE, IBM) CSI drivers have very few GitHub stars, but the selection bias vs. open-source or cloud offerings might make this a bad proxy metric. Unsure about Pure but Portworx feels like a well-thought-out product with engineers who are active in the broader sysadmin community.

evil_bunnY posted:

LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy

That's impressive. I've seen TSM take a .5 FTE just to keep up with licensing.

Vulture Culture fucked around with this message at 17:35 on Nov 28, 2023

# ? Nov 28, 2023 17:31

shame on an IGA: Apr 8, 2005

Thanks Ants posted:

Surely you mean

# ? Nov 28, 2023 17:41

Kaddish: Feb 7, 2002

evil_bunnY posted:

LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy

Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah.

# ? Nov 28, 2023 22:09

evil_bunnY: Apr 2, 2003

Kaddish posted:

Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah.

We moved off AIX like 4? years ago? You poor souls.

Vulture Culture posted:

That's impressive. I've seen TSM take a .5 FTE just to keep up with licensing.

That's in addition to one internal FTE, let's not go crazy. Our consulting partners are actually p good.

Vulture Culture posted:

My personal experience

Thanks, appreciate the input.

evil_bunnY fucked around with this message at 11:25 on Nov 29, 2023

# ? Nov 29, 2023 11:22

HalloKitty: Sep 30, 2005; Adjust the bass and let the Alpine blast

Anyone have experience with on-prem object storage?
I'm thinking Dell ECS, Scality Artesca/Ring, Cohesity

Ideally a hardware appliance I can just deploy and scale by throwing more of them in a cluster. Interested in hearing quotes people have had, just out of curiosity. It'll be for backup.

# ? Dec 6, 2023 11:29

Thanks Ants: May 21, 2004; #essereFerrari

Dell didn't want to talk to me unless I was spending over �200k but this was a few years ago. If you're in the US it might be worth getting IX Systems to quote you a TrueNAS system with their full support offering, even if it's just to sense check some pricing.

# ? Dec 6, 2023 13:09

Qwijib0: Apr 10, 2007; Who needs on-field skills when you can dance like this?; Fun Shoe

I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.

# ? Dec 6, 2023 14:08

CommieGIR: Aug 22, 2006; The blue glow is a feature, not a bug; Pillbug

TrueNAS is doing some good stuff and has decent support.

# ? Dec 6, 2023 15:15

Zorak of Michigan: Jun 10, 2006

Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?

# ? Dec 6, 2023 16:43

Thanks Ants: May 21, 2004; #essereFerrari

It's all grown up now, you can get cluster nodes with dual controllers and everything. Object storage for backups shouldn't need hitless upgrades though and you'll save a lot by being able to schedule maintenance windows when no backup jobs are running.

# ? Dec 6, 2023 17:08

BlankSystemDaemon: Mar 13, 2009

Zorak of Michigan posted:

Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?

The amount of effort required to make kernel upgrade without reboots possible utterly dwarfs the amount of effort required to use gmultipath(8), ZFS, CARP, hastd(8), and ctld(8) to get high-available storage.

BlankSystemDaemon fucked around with this message at 10:41 on Dec 7, 2023

# ? Dec 7, 2023 10:37

evil_bunnY: Apr 2, 2003

BlankSystemDaemon posted:

The amount of effort required to make kernel upgrade without reboots possible utterly dwarfs the amount of effort required to use gmultipath(8), ZFS, CARP, hastd(8), and ctld(8) to get high-available storage.

OMG this

# ? Dec 7, 2023 14:50

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

I could really use a hand with vendor / product line ideas for a brand new, complete greenfield research computing deployment. IOPS requirements are low, but we need decent throughput and because the data is going to be primarily genetic data or imaging data that is already highly compressed, we're unlikely to get much benefit at all from compression or deduplication, which certainly throws a wrench in the big storage vendors' value proposition. The goal here is similar performance to a homegrown Ceph cluster that I've built for pennies, but can't in good conscience recommend someone else emulate or operate. That system is approx 3PB usable across 240 OSDs over 20 hosts on 10GBaseT, it delivers about 300-400MB/s read/write to each process up to a total system throughput of a couple GB/s (3-5GB/s) across the system to all clients. Recovery throughput is higher than that, but we are mostly sequential read and this has met our needs well for a couple years now. It's also nice as hell that we can just throw more nodes / disks in and it just grows. In a perfect world we'd be on 25GbE, but this seems fine for now. It helps that all our data is compressed, so a compute job reading this is likely decompressing domain-specific compression or zstd so that 300MB/s from the disks is plenty. A filesystem that can be mounted from Linux hosts is a must, but we can use relaxed POSIX semantics because there's not actually multiple processes writing to a given file basically ever.

The new effort has budget for something like 1PB of Pure FlashArray//C, but even that is way more performance and IOPS than we need. What should we be looking at? Does anyone exist in this space? Something like a Pure FlashBlade//E is the next step down their stack, but it looks like minimum size for that is 4PB, which might be over budget. What lines should I be looking at? Who still does disk-based capacity solutions that aren't purely for backup and can meet multi-client throughput needs?

I keep hearing good things about VAST but I think they won't even pick up the phone to talk about a 1PB solution, and they're also leaning hard on compression / dedupe.

Qwijib0 posted:

I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.

I really want to hear from people a couple years down the line who try doing TrueNAS Scale with horizontal scale-out, because GlusterFS looks like an absolute dead end and I wouldn't want to build on it. Red Hat has scaled down Gluster and is ending their support offering for it in 2024, it looks like Ceph is eating that whole segment but Ceph would obviate the need for ZFS and the rest of what TrueNAS offers, which may have contributed to why they chose Gluster.

# ? Dec 14, 2023 20:34

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that�s worth checking out.

DDN has a couple of interesting QLC platforms that might be worth looking at.

There�s a few Ceph plus glue systems out there; some people like them.

I�d check out Qumulo; it might be a good fit for a streaming workload.

# ? Dec 14, 2023 21:03

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that�s worth checking out.

DDN has a couple of interesting QLC platforms that might be worth looking at.

There�s a few Ceph plus glue systems out there; some people like them.

I�d check out Qumulo; it might be a good fit for a streaming workload.

Thanks! Ceph should be the frontrunner here because we know it meets requirements, but my early impression from feeling it out is that it seems like RH Ceph support alone costs as much or more than just buying a complete storage appliance with support, it's depressing.

I feel like this kind of workload can't be that unusual, I'd bet that streaming video workloads look somewhat like this and there must be a segment of storage targeted at servicing video production / archive.

# ? Dec 14, 2023 21:08

Qwijib0: Apr 10, 2007; Who needs on-field skills when you can dance like this?; Fun Shoe

in a well actually posted:

I�d check out Qumulo; it might be a good fit for a streaming workload.

Someone else mentioned them first, so now I can't be accused of being a shill :haw:

Their archive tier might meet your requirements, I use them for my primary video editing storage, and I have no complaints at all.

# ? Dec 14, 2023 21:40

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.

# ? Dec 14, 2023 21:41

Twerk from Home: Jan 17, 2009; This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.

I'm not opposed to more features / better filesystems, just seeking maximum value for what is a very simple workload from the storage system's point of view. Who are the big players there?

A similar system that was on GPFS is moving to https://www.panasas.com/, who I know nothing about but I guess should be in the conversation too.

Twerk from Home fucked around with this message at 21:49 on Dec 14, 2023

# ? Dec 14, 2023 21:47

evil_bunnY: Apr 2, 2003

The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise.

# ? Dec 14, 2023 22:12

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Panasas is on the legacy side but I know some folks like them. A well designed GPFS isn�t bad but it is very network sensitive; IBM and Lenovo have products that can be very competitive. There are a few others that build GPFS platforms as well but they need to pay for their platform and the gpfs licenses.

Lustre is open source and a lot of the problems people tended to have with it have been mitigated in the newer versions (2.15+), but is pretty admin intensive. DDN and HPE have solutions. The development is primarily done by DDN; AWS building a FSx on top of community Lustre has kinda taken a lot of steam out of feature development, imho.

Weka comes up in conversations with sales people; they like to tier out to an object store.

# ? Dec 14, 2023 22:16

Zorak of Michigan: Jun 10, 2006

Low IOPS but high throughput is the use case for Flash Blade. I'd at least ask Pure for a quote. We have a lot of FlashArrays and Pure has been great to us.

# ? Dec 15, 2023 00:49

Rhymenoserous: May 23, 2008

I�ll second pure, heard nothing but good things.

# ? Dec 15, 2023 04:16

qutius: Apr 2, 2003; NO PARTIES

evil_bunnY posted:

The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise.

Yup, all day long.

# ? Jan 8, 2024 06:31

Kaddish: Feb 7, 2002

Anyone have experience with Dell PowerFlex?

I haven't seen pricing yet but I'm going to guess it's extremely expensive?

I'm considering it for a couple of remote hospital sites with aging infrastructure as well as a replication target to a disaster recovery site.

Kaddish fucked around with this message at 20:19 on Apr 16, 2024

# ? Apr 16, 2024 20:12

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Kaddish posted:

Anyone have experience with Dell PowerFlex?

I haven't seen pricing yet but I'm going to guess it's extremely expensive?

I'm considering it for a couple of remote hospital sites with aging infrastructure as well as a replication target to a disaster recovery site.

PowerFlex is outside my needs, so zero help there.

Out of curiosity, what are your workloads and requirements like?

Block? File? Object? All?!?!?
IOPS/Latency requirements?
Space before any backend dedup?

# ? Apr 19, 2024 12:33

Kaddish: Feb 7, 2002

Moey posted:

PowerFlex is outside my needs, so zero help there.

Out of curiosity, what are your workloads and requirements like?

Block? File? Object? All?!?!?
IOPS/Latency requirements?
Space before any backend dedup?

That's just it - I think it may be outside of my needs as well, but Dell is more than happy to sell it to me. I don't have numbers yet.

I am looking at it to replace old infrastructure at two smaller hospital sites - running maybe 20 total VM guests. Both of those sites would replicate to a third disaster recovery site. This would be block, 10kish IOPS, low latency, mixed use. Probably 20TB per site.

I have Pure at my main datacenter that runs the majority of my VM environment that replicates to another Pure array at DR site. I'm really happy with Pure and have no plans to change. The ESX hosts needs a refresh however. I might end up putting smaller Pure arrays at my smaller sites.

# ? Apr 19, 2024 13:53

Moey: Oct 22, 2010; I LIKE TO MOVE IT

I currently have a few Unity XT 480F units filled with 8tb SSDs, they have been rock solid over the years. Dual controllers, 4x25gbe per controller. I'm only using them for block via iSCSI to a bunch of vSphere clusters, but I've heard from another nerd in these forums that their SMB/CIFS implementation is solid as well.

Their inline dedup and compression is seamless and seems to perform just as well as other arrays I have delt with.

Snapshots/replication just work as expected. Veeam integrates to the management interface as well, so it can do run backups directly from storage snapshots, removing that VM stun time on larger/more active machines. Without that I had to work around SQL jobs/log backups or else our DB janitor would yell at me.

Performance wise, my busiest array is running about 400 persistent VDI VM and 150 virtual server workloads. Daily use normally doesn't exceed 15,000 iops for so I see my Ethernet links getting north of 5-6gbps. Latency is pretty solid just above 1ms. Only have had it creep into the 3ms range once, and it ended up being a port flapping issue on my ESXi host side.

I've been running these for about 4 years now, so knowing dell/emc, some other snazzy names product line will replace this mid range series, but probably be very similar.

Planning on doing some refreshing on 2025, curious to see if Pure's pricing has came down to a little more sane levels. 4 years ago I was able to get like 3x the usable storage for less overall cost with the Unity XT boxes over Pure and their //X line.

I could probably get away with one //X and a few //C across my sites/DR.

# ? Apr 20, 2024 10:20

Kaddish: Feb 7, 2002

Moey posted:

I could probably get away with one //X and a few //C across my sites/DR.

You definitely could. My x50 sees ~45k iops peaks pretty regularly. Usually sub 1ms latency. This is running on old 16G fabric switches. 533 VM guests currently.

I have my VM environment split between the Pure and a SVC with FlashSystem 5100 backend. Pure is async replication to x20 and SVC is mix of global and metro mirror to SVC pair with 5030 backend.

Kaddish fucked around with this message at 14:47 on Apr 20, 2024

# ? Apr 20, 2024 14:45

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Kaddish posted:

You definitely could. My x50 sees ~45k iops peaks pretty regularly. Usually sub 1ms latency. This is running on old 16G fabric switches. 533 VM guests currently.

I have my VM environment split between the Pure and a SVC with FlashSystem 5100 backend. Pure is async replication to x20 and SVC is mix of global and metro mirror to SVC pair with 5030 backend.

The //C arrays claim sub 3ms latency, which should be acceptable for my VDI stuff, probably even the server workloads.

I'd lose some flexibility with expansion of existing pools going to Pure, but that can be managed.

# ? Apr 21, 2024 14:46

Kaddish: Feb 7, 2002

Moey posted:

The //C arrays claim sub 3ms latency, which should be acceptable for my VDI stuff, probably even the server workloads.

I'd lose some flexibility with expansion of existing pools going to Pure, but that can be managed.

I'm actually very curious about the //E array. We'll be doing a backup software/hardware refresh in the next couple of years and the //E will probably be a contender to replace all of our disk. 1.5PB of disk, 3PB total with replication.

# ? Apr 21, 2024 15:16

Morbus: May 18, 2004

Kaddish posted:

I'm actually very curious about the //E array. We'll be doing a backup software/hardware refresh in the next couple of years and the //E will probably be a contender to replace all of our disk. 1.5PB of disk, 3PB total with replication.

incidentally, disk (and probably SSD) prices may be sharply elevated in the near future and remain that way for a while. You may want to consider that if you plan on a significant refresh in the next year.

# ? Apr 23, 2024 00:01

Adbot: ADBOT LOVES YOU

# ? Apr 28, 2024 13:20

Pile Of Garbage: May 28, 2007

Moey posted:

I currently have a few Unity XT 480F units filled with 8tb SSDs, they have been rock solid over the years. Dual controllers, 4x25gbe per controller. I'm only using them for block via iSCSI to a bunch of vSphere clusters, but I've heard from another nerd in these forums that their SMB/CIFS implementation is solid as well.

FWIW I'm in a similar situation except doing vVols for the majority of our VMs instead of iSCSI, along with some iSCSI block LUNs for critical stuff. I inherited the array and it's been solid as poo poo, very impressive. Further still I've been seconded recently and thus had junior engineers handling things and they've managed both a controller replacement and firmware upgrade with zero issues. Solid kit.

At the moment we only run a dedicated Windows file server but based on the feedback here I think we might migrate.

# ? Apr 23, 2024 15:37

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »