Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
OnceIWasAnOstrich
Jul 22, 2006

evil_bunnY posted:

We want disk/flash not tape 🥶

Well, if flash is what you want... We don't really use the object side in favor of the fancy NFS, but the many-more petabytes we have on flash in Vast has been relatively smooth. A big improvement over whatever IBM is calling GPFS these days.

Adbot
ADBOT LOVES YOU

Maneki Neko
Oct 27, 2000

evil_bunnY posted:

Do any of you run on-prem object stores (±1PB) you're happy with?

We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian.

kzersatz
Oct 13, 2012

How's it the kiss of death, if I have no lips?
College Slice

Kaddish posted:

Welp, just bought a NetApp c250. I haven't used Ontap in like....10 years. Looks like there's been a few changes! Learning about LUN configuration and what a SVM even is like a little baby.

Edit - Looks like I need to manually enable space_alloc if I'm going to thin provision, thanks admin guide!

Any Ontap gurus here have any tips for an Ontap newbie? This will be 90% Windows client general use.

Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it :)

kzersatz
Oct 13, 2012

How's it the kiss of death, if I have no lips?
College Slice

Maneki Neko posted:

We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian.

Netapp storage grid.
It's pricy to start, but it is very reliable

evil_bunnY
Apr 2, 2003

kzersatz posted:

Netapp storage grid.
It's pricy to start, but it is very reliable
Oh hey we're talking to a partner org with decent install of this. Anything of note?

Vulture Culture posted:

Depends how manual the rest of your process is. If you're zoning out manually provisioned LUNs to initiators/HBAs, the experience isn't going to differ a ton between vendors. If you're doing dynamic provisioning from a Kubernetes or OpenStack cluster, there's a lot of variation in CSI/Cinder drivers, their quality, and how tightly coupled they are to the underlying storage topology
Because this is suddenly relevant here, which ones are best?

Kaddish
Feb 7, 2002

kzersatz posted:

Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it :)

My little 100TiB C250 is chugging along nicely. Things have (obviously) changed a lot since the N-Series days.

Unrelated, I'll be going to Spectrum Protect (TSM) training and taking over all backup administration. Not looking forward to it.

We're probably going to transition to something like Veam over the next 3-5 years. In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.

evil_bunnY
Apr 2, 2003

Kaddish posted:

In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.
LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy :eng99:

Thanks Ants
May 21, 2004

#essereFerrari


Surely you mean :black101:

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

Because this is suddenly relevant here, which ones are best?
My personal experience is limited on the K8s front. I found that the Trident operator for NetApp ONTAP had a great feature set and the performance was off the charts for our use cases in early testing, there's a huge user base compared to basically every other storage vendor, but the CSI drivers presented some stability problems I didn't see using AWS's native storage options. Gluster is basically dead. Ceph's integrations work well, but you're stuck managing Ceph, and I'm a little bit anxious about the recent move out from Red Hat into IBM to be closer to the other IBM storage teams. Traditional vendors' (Dell, HPE, IBM) CSI drivers have very few GitHub stars, but the selection bias vs. open-source or cloud offerings might make this a bad proxy metric. Unsure about Pure but Portworx feels like a well-thought-out product with engineers who are active in the broader sysadmin community.

evil_bunnY posted:

LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy :eng99:
That's impressive. I've seen TSM take a .5 FTE just to keep up with licensing.

Vulture Culture fucked around with this message at 17:35 on Nov 28, 2023

shame on an IGA
Apr 8, 2005

Thanks Ants posted:

Surely you mean :black101:

:ccp:

Kaddish
Feb 7, 2002

evil_bunnY posted:

LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy :eng99:

Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah.

evil_bunnY
Apr 2, 2003

Kaddish posted:

Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah.
We moved off AIX like 4? years ago? You poor souls.

Vulture Culture posted:

That's impressive. I've seen TSM take a .5 FTE just to keep up with licensing.
That's in addition to one internal FTE, let's not go crazy. Our consulting partners are actually p good.

Vulture Culture posted:

My personal experience
Thanks, appreciate the input.

evil_bunnY fucked around with this message at 11:25 on Nov 29, 2023

HalloKitty
Sep 30, 2005

Adjust the bass and let the Alpine blast
Anyone have experience with on-prem object storage?
I'm thinking Dell ECS, Scality Artesca/Ring, Cohesity

Ideally a hardware appliance I can just deploy and scale by throwing more of them in a cluster. Interested in hearing quotes people have had, just out of curiosity. It'll be for backup.

Thanks Ants
May 21, 2004

#essereFerrari


Dell didn't want to talk to me unless I was spending over £200k but this was a few years ago. If you're in the US it might be worth getting IX Systems to quote you a TrueNAS system with their full support offering, even if it's just to sense check some pricing.

Qwijib0
Apr 10, 2007

Who needs on-field skills when you can dance like this?

Fun Shoe
I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug
TrueNAS is doing some good stuff and has decent support.

Zorak of Michigan
Jun 10, 2006

Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?

Thanks Ants
May 21, 2004

#essereFerrari


It's all grown up now, you can get cluster nodes with dual controllers and everything. Object storage for backups shouldn't need hitless upgrades though and you'll save a lot by being able to schedule maintenance windows when no backup jobs are running.

BlankSystemDaemon
Mar 13, 2009



Zorak of Michigan posted:

Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?
The amount of effort required to make kernel upgrade without reboots possible utterly dwarfs the amount of effort required to use gmultipath(8), ZFS, CARP, hastd(8), and ctld(8) to get high-available storage.

BlankSystemDaemon fucked around with this message at 10:41 on Dec 7, 2023

evil_bunnY
Apr 2, 2003

BlankSystemDaemon posted:

The amount of effort required to make kernel upgrade without reboots possible utterly dwarfs the amount of effort required to use gmultipath(8), ZFS, CARP, hastd(8), and ctld(8) to get high-available storage.
OMG this

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I could really use a hand with vendor / product line ideas for a brand new, complete greenfield research computing deployment. IOPS requirements are low, but we need decent throughput and because the data is going to be primarily genetic data or imaging data that is already highly compressed, we're unlikely to get much benefit at all from compression or deduplication, which certainly throws a wrench in the big storage vendors' value proposition. The goal here is similar performance to a homegrown Ceph cluster that I've built for pennies, but can't in good conscience recommend someone else emulate or operate. That system is approx 3PB usable across 240 OSDs over 20 hosts on 10GBaseT, it delivers about 300-400MB/s read/write to each process up to a total system throughput of a couple GB/s (3-5GB/s) across the system to all clients. Recovery throughput is higher than that, but we are mostly sequential read and this has met our needs well for a couple years now. It's also nice as hell that we can just throw more nodes / disks in and it just grows. In a perfect world we'd be on 25GbE, but this seems fine for now. It helps that all our data is compressed, so a compute job reading this is likely decompressing domain-specific compression or zstd so that 300MB/s from the disks is plenty. A filesystem that can be mounted from Linux hosts is a must, but we can use relaxed POSIX semantics because there's not actually multiple processes writing to a given file basically ever.

The new effort has budget for something like 1PB of Pure FlashArray//C, but even that is way more performance and IOPS than we need. What should we be looking at? Does anyone exist in this space? Something like a Pure FlashBlade//E is the next step down their stack, but it looks like minimum size for that is 4PB, which might be over budget. What lines should I be looking at? Who still does disk-based capacity solutions that aren't purely for backup and can meet multi-client throughput needs?

I keep hearing good things about VAST but I think they won't even pick up the phone to talk about a 1PB solution, and they're also leaning hard on compression / dedupe.

Qwijib0 posted:

I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.

I really want to hear from people a couple years down the line who try doing TrueNAS Scale with horizontal scale-out, because GlusterFS looks like an absolute dead end and I wouldn't want to build on it. Red Hat has scaled down Gluster and is ending their support offering for it in 2024, it looks like Ceph is eating that whole segment but Ceph would obviate the need for ZFS and the rest of what TrueNAS offers, which may have contributed to why they chose Gluster.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out.

DDN has a couple of interesting QLC platforms that might be worth looking at.

There’s a few Ceph plus glue systems out there; some people like them.

I’d check out Qumulo; it might be a good fit for a streaming workload.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out.

DDN has a couple of interesting QLC platforms that might be worth looking at.

There’s a few Ceph plus glue systems out there; some people like them.

I’d check out Qumulo; it might be a good fit for a streaming workload.

Thanks! Ceph should be the frontrunner here because we know it meets requirements, but my early impression from feeling it out is that it seems like RH Ceph support alone costs as much or more than just buying a complete storage appliance with support, it's depressing.

I feel like this kind of workload can't be that unusual, I'd bet that streaming video workloads look somewhat like this and there must be a segment of storage targeted at servicing video production / archive.

Qwijib0
Apr 10, 2007

Who needs on-field skills when you can dance like this?

Fun Shoe

in a well actually posted:


I’d check out Qumulo; it might be a good fit for a streaming workload.

Someone else mentioned them first, so now I can't be accused of being a shill :haw:

Their archive tier might meet your requirements, I use them for my primary video editing storage, and I have no complaints at all.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.

I'm not opposed to more features / better filesystems, just seeking maximum value for what is a very simple workload from the storage system's point of view. Who are the big players there?

A similar system that was on GPFS is moving to https://www.panasas.com/, who I know nothing about but I guess should be in the conversation too.

Twerk from Home fucked around with this message at 21:49 on Dec 14, 2023

evil_bunnY
Apr 2, 2003

The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Panasas is on the legacy side but I know some folks like them. A well designed GPFS isn’t bad but it is very network sensitive; IBM and Lenovo have products that can be very competitive. There are a few others that build GPFS platforms as well but they need to pay for their platform and the gpfs licenses.

Lustre is open source and a lot of the problems people tended to have with it have been mitigated in the newer versions (2.15+), but is pretty admin intensive. DDN and HPE have solutions. The development is primarily done by DDN; AWS building a FSx on top of community Lustre has kinda taken a lot of steam out of feature development, imho.

Weka comes up in conversations with sales people; they like to tier out to an object store.

Zorak of Michigan
Jun 10, 2006

Low IOPS but high throughput is the use case for Flash Blade. I'd at least ask Pure for a quote. We have a lot of FlashArrays and Pure has been great to us.

Rhymenoserous
May 23, 2008
I’ll second pure, heard nothing but good things.

Adbot
ADBOT LOVES YOU

qutius
Apr 2, 2003
NO PARTIES

evil_bunnY posted:

The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise.

Yup, all day long.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply