|
evil_bunnY posted:We want disk/flash not tape 🥶 Well, if flash is what you want... We don't really use the object side in favor of the fancy NFS, but the many-more petabytes we have on flash in Vast has been relatively smooth. A big improvement over whatever IBM is calling GPFS these days.
|
# ? Nov 1, 2023 00:28 |
|
|
# ? Mar 28, 2024 19:32 |
|
evil_bunnY posted:Do any of you run on-prem object stores (±1PB) you're happy with? We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian.
|
# ? Nov 2, 2023 19:11 |
|
Kaddish posted:Welp, just bought a NetApp c250. I haven't used Ontap in like....10 years. Looks like there's been a few changes! Learning about LUN configuration and what a SVM even is like a little baby. Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it
|
# ? Nov 28, 2023 06:01 |
|
Maneki Neko posted:We had very boring/reliable (now owned by Quantum) ActiveScale cluster that we have been quite happy with. Can't go into details (lol thanks legal agreements) but we did not have a good time with Cloudian. Netapp storage grid. It's pricy to start, but it is very reliable
|
# ? Nov 28, 2023 06:01 |
|
kzersatz posted:Netapp storage grid. Vulture Culture posted:Depends how manual the rest of your process is. If you're zoning out manually provisioned LUNs to initiators/HBAs, the experience isn't going to differ a ton between vendors. If you're doing dynamic provisioning from a Kubernetes or OpenStack cluster, there's a lot of variation in CSI/Cinder drivers, their quality, and how tightly coupled they are to the underlying storage topology
|
# ? Nov 28, 2023 12:22 |
|
kzersatz posted:Been doing Ontap off and on for the last 12 years or so, currently support 4.5 PB of it My little 100TiB C250 is chugging along nicely. Things have (obviously) changed a lot since the N-Series days. Unrelated, I'll be going to Spectrum Protect (TSM) training and taking over all backup administration. Not looking forward to it. We're probably going to transition to something like Veam over the next 3-5 years. In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.
|
# ? Nov 28, 2023 14:15 |
|
Kaddish posted:In my entire time with my company, I don't remember a time when there wasn't an issue with TSM.
|
# ? Nov 28, 2023 16:15 |
|
Surely you mean
|
# ? Nov 28, 2023 16:56 |
|
evil_bunnY posted:Because this is suddenly relevant here, which ones are best? evil_bunnY posted:LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy Vulture Culture fucked around with this message at 17:35 on Nov 28, 2023 |
# ? Nov 28, 2023 17:31 |
|
Thanks Ants posted:Surely you mean
|
# ? Nov 28, 2023 17:41 |
|
evil_bunnY posted:LMAO it takes a 50% consulting FTE to keep ours from eating itself. And there's like 12 people in the (admittedly small) country who can competently manage it. When they hired another tech they had to get a fully remote icelandic guy Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah.
|
# ? Nov 28, 2023 22:09 |
|
Kaddish posted:Replication hasn't been working for at least 6 months, despite bringing in outside help to get it running again. Supposedly there is an APAR to fix this, but I don't know what it's for. Our current backup admin isn't bothering to do decoms etc, they just need someone to at least do the the day maintenance. It also runs on AIX, which I'm not super familiar with so. Yeah. Vulture Culture posted:That's impressive. I've seen TSM take a .5 FTE just to keep up with licensing. Vulture Culture posted:My personal experience evil_bunnY fucked around with this message at 11:25 on Nov 29, 2023 |
# ? Nov 29, 2023 11:22 |
|
Anyone have experience with on-prem object storage? I'm thinking Dell ECS, Scality Artesca/Ring, Cohesity Ideally a hardware appliance I can just deploy and scale by throwing more of them in a cluster. Interested in hearing quotes people have had, just out of curiosity. It'll be for backup.
|
# ? Dec 6, 2023 11:29 |
|
Dell didn't want to talk to me unless I was spending over £200k but this was a few years ago. If you're in the US it might be worth getting IX Systems to quote you a TrueNAS system with their full support offering, even if it's just to sense check some pricing.
|
# ? Dec 6, 2023 13:09 |
|
I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.
|
# ? Dec 6, 2023 14:08 |
|
TrueNAS is doing some good stuff and has decent support.
|
# ? Dec 6, 2023 15:15 |
|
Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime?
|
# ? Dec 6, 2023 16:43 |
|
It's all grown up now, you can get cluster nodes with dual controllers and everything. Object storage for backups shouldn't need hitless upgrades though and you'll save a lot by being able to schedule maintenance windows when no backup jobs are running.
|
# ? Dec 6, 2023 17:08 |
Zorak of Michigan posted:Isn't TrueNAS still server-class hardware, such that OS updates require a brief downtime? BlankSystemDaemon fucked around with this message at 10:41 on Dec 7, 2023 |
|
# ? Dec 7, 2023 10:37 |
|
BlankSystemDaemon posted:The amount of effort required to make kernel upgrade without reboots possible utterly dwarfs the amount of effort required to use gmultipath(8), ZFS, CARP, hastd(8), and ctld(8) to get high-available storage.
|
# ? Dec 7, 2023 14:50 |
|
I could really use a hand with vendor / product line ideas for a brand new, complete greenfield research computing deployment. IOPS requirements are low, but we need decent throughput and because the data is going to be primarily genetic data or imaging data that is already highly compressed, we're unlikely to get much benefit at all from compression or deduplication, which certainly throws a wrench in the big storage vendors' value proposition. The goal here is similar performance to a homegrown Ceph cluster that I've built for pennies, but can't in good conscience recommend someone else emulate or operate. That system is approx 3PB usable across 240 OSDs over 20 hosts on 10GBaseT, it delivers about 300-400MB/s read/write to each process up to a total system throughput of a couple GB/s (3-5GB/s) across the system to all clients. Recovery throughput is higher than that, but we are mostly sequential read and this has met our needs well for a couple years now. It's also nice as hell that we can just throw more nodes / disks in and it just grows. In a perfect world we'd be on 25GbE, but this seems fine for now. It helps that all our data is compressed, so a compute job reading this is likely decompressing domain-specific compression or zstd so that 300MB/s from the disks is plenty. A filesystem that can be mounted from Linux hosts is a must, but we can use relaxed POSIX semantics because there's not actually multiple processes writing to a given file basically ever. The new effort has budget for something like 1PB of Pure FlashArray//C, but even that is way more performance and IOPS than we need. What should we be looking at? Does anyone exist in this space? Something like a Pure FlashBlade//E is the next step down their stack, but it looks like minimum size for that is 4PB, which might be over budget. What lines should I be looking at? Who still does disk-based capacity solutions that aren't purely for backup and can meet multi-client throughput needs? I keep hearing good things about VAST but I think they won't even pick up the phone to talk about a 1PB solution, and they're also leaning hard on compression / dedupe. Qwijib0 posted:I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine. I really want to hear from people a couple years down the line who try doing TrueNAS Scale with horizontal scale-out, because GlusterFS looks like an absolute dead end and I wouldn't want to build on it. Red Hat has scaled down Gluster and is ending their support offering for it in 2024, it looks like Ceph is eating that whole segment but Ceph would obviate the need for ZFS and the rest of what TrueNAS offers, which may have contributed to why they chose Gluster.
|
# ? Dec 14, 2023 20:34 |
|
Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out. DDN has a couple of interesting QLC platforms that might be worth looking at. There’s a few Ceph plus glue systems out there; some people like them. I’d check out Qumulo; it might be a good fit for a streaming workload.
|
# ? Dec 14, 2023 21:03 |
|
in a well actually posted:Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out. Thanks! Ceph should be the frontrunner here because we know it meets requirements, but my early impression from feeling it out is that it seems like RH Ceph support alone costs as much or more than just buying a complete storage appliance with support, it's depressing. I feel like this kind of workload can't be that unusual, I'd bet that streaming video workloads look somewhat like this and there must be a segment of storage targeted at servicing video production / archive.
|
# ? Dec 14, 2023 21:08 |
|
in a well actually posted:
Someone else mentioned them first, so now I can't be accused of being a shill Their archive tier might meet your requirements, I use them for my primary video editing storage, and I have no complaints at all.
|
# ? Dec 14, 2023 21:40 |
|
Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.
|
# ? Dec 14, 2023 21:41 |
|
in a well actually posted:Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want. I'm not opposed to more features / better filesystems, just seeking maximum value for what is a very simple workload from the storage system's point of view. Who are the big players there? A similar system that was on GPFS is moving to https://www.panasas.com/, who I know nothing about but I guess should be in the conversation too. Twerk from Home fucked around with this message at 21:49 on Dec 14, 2023 |
# ? Dec 14, 2023 21:47 |
|
The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise.
|
# ? Dec 14, 2023 22:12 |
|
Panasas is on the legacy side but I know some folks like them. A well designed GPFS isn’t bad but it is very network sensitive; IBM and Lenovo have products that can be very competitive. There are a few others that build GPFS platforms as well but they need to pay for their platform and the gpfs licenses. Lustre is open source and a lot of the problems people tended to have with it have been mitigated in the newer versions (2.15+), but is pretty admin intensive. DDN and HPE have solutions. The development is primarily done by DDN; AWS building a FSx on top of community Lustre has kinda taken a lot of steam out of feature development, imho. Weka comes up in conversations with sales people; they like to tier out to an object store.
|
# ? Dec 14, 2023 22:16 |
|
Low IOPS but high throughput is the use case for Flash Blade. I'd at least ask Pure for a quote. We have a lot of FlashArrays and Pure has been great to us.
|
# ? Dec 15, 2023 00:49 |
|
I’ll second pure, heard nothing but good things.
|
# ? Dec 15, 2023 04:16 |
|
|
# ? Mar 28, 2024 19:32 |
|
evil_bunnY posted:The workloads you're describing could be easily served by Isilon capacity/hybrid nodes, but no idea where that falls price wise. Yup, all day long.
|
# ? Jan 8, 2024 06:31 |