Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Has anybody ever quoted out FreeNAS certified servers? I'm doing a little consulting with an academic lab who is looking to improve their setup from "~30TB of usable storage on a single RAID controller in a single SUSE server".

They are needing at least 50TB space, preferably more like 70TB. Budget is more like Synology or QNAP filled with 8TB WD Reds than all-flash Netapp. The only hard requirements are reasonable continuous read performance for less than 5 users at once, NFS and CIFS shares, 10gigE. No need for high availability and it will be backed up to tape regularly.

It's a bioinformatics lab needing somewhere to centralize storage of research data. No home folders stored or high IOPS needed, but I would think that one of the certified FreeNAS boxes would give them a better long-term experience than the QNAP or Synology 12-bay NASes. Any other budget options that would give a reasonable experience?

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Vulture Culture posted:

What kind of bioinformatics are they doing? How many files are on their filesystem? (It sounds like a ridiculous question, but I once supported a lab that routinely filled their filesystems with a billion tiny files doing de novo assembly.)

Good question, what data I've seen involves fewer much larger files. They do mostly GWAS and pedigree reconstruction, so the genomes are coming to them already assembled. Also their worker machines have 384GB of RAM each or more, and they tend to slurp whole files into memory rather than doing random disk i/o or paging, but that also might be to deal with how much their current storage sucks.

This is where the data is going to be analyzed from, not an archival backup. Good point about they might not know what they want.

Twerk from Home fucked around with this message at 22:30 on Mar 28, 2017

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Vulture Culture posted:

They may also legitimately have no idea how much their storage is costing them in terms of time to complete a run. They might just think it's supposed to be that slow, and they would never say anything unless a known quantity starts taking significantly longer to finish. It's worth doing an analysis to see the access patterns over time (thankfully easy on Linux with minimal setup).

So what's the best bang for buck way to make it faster? They're hoping to spend less than $15k to get that 50-70TB of space, hence my comment that the price range is more like "Synology with WD Reds in it".

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

evil_bunnY posted:

Hello friends one of our research groups wants a (probably all flash) ~100TB box to serve ~1GB files over 10GBE/SMB who should we be talking to? I'm the EU since that probably matters.

High random I/O with many users, or maximum throughput with a low user count? That's going to be expensive, my wife's research lab is looking at stuffing a bunch of 8TB Intel P4510s into a Supermicro 1P EPYC box to satisfy the "high throughput" use case.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

evil_bunnY posted:

It's a small amount of storage but it's for core systems. This was supposed to be a Ceph back byzantine piece of poo poo and it's taken me this long for SuSE to show their whole rear end and me to convince them to hop off that train.

You were doing Ceph on SuSE?

Could you share a bit about your Ceph experience? It has broadly a good reputation at this point from what I've heard, with the caveat that you are probably paying Red Hat more than you really want to for support.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What are the downsides of top-loading storage chassis vs front/rear in traditional chassis design? The top-loaders offer more disk bays per dollar, and take up less rack space per disk bay.

Specifically, I'm comparing things like 36-bay front/rear capacity storage-focused traditional chassis vs 45 or 60-bay top loaders, which seem to offer a better value.

Are disk temperatures going to be significantly higher on the top loaders?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I could really use a hand with vendor / product line ideas for a brand new, complete greenfield research computing deployment. IOPS requirements are low, but we need decent throughput and because the data is going to be primarily genetic data or imaging data that is already highly compressed, we're unlikely to get much benefit at all from compression or deduplication, which certainly throws a wrench in the big storage vendors' value proposition. The goal here is similar performance to a homegrown Ceph cluster that I've built for pennies, but can't in good conscience recommend someone else emulate or operate. That system is approx 3PB usable across 240 OSDs over 20 hosts on 10GBaseT, it delivers about 300-400MB/s read/write to each process up to a total system throughput of a couple GB/s (3-5GB/s) across the system to all clients. Recovery throughput is higher than that, but we are mostly sequential read and this has met our needs well for a couple years now. It's also nice as hell that we can just throw more nodes / disks in and it just grows. In a perfect world we'd be on 25GbE, but this seems fine for now. It helps that all our data is compressed, so a compute job reading this is likely decompressing domain-specific compression or zstd so that 300MB/s from the disks is plenty. A filesystem that can be mounted from Linux hosts is a must, but we can use relaxed POSIX semantics because there's not actually multiple processes writing to a given file basically ever.

The new effort has budget for something like 1PB of Pure FlashArray//C, but even that is way more performance and IOPS than we need. What should we be looking at? Does anyone exist in this space? Something like a Pure FlashBlade//E is the next step down their stack, but it looks like minimum size for that is 4PB, which might be over budget. What lines should I be looking at? Who still does disk-based capacity solutions that aren't purely for backup and can meet multi-client throughput needs?

I keep hearing good things about VAST but I think they won't even pick up the phone to talk about a 1PB solution, and they're also leaning hard on compression / dedupe.

Qwijib0 posted:

I can't speak to clustering, but I have had some truenas (core, intend to move to scale soon) boxes doing object store backup for ~2 years and they sit there and just work, so the ix technical implementation seems fine.

I really want to hear from people a couple years down the line who try doing TrueNAS Scale with horizontal scale-out, because GlusterFS looks like an absolute dead end and I wouldn't want to build on it. Red Hat has scaled down Gluster and is ending their support offering for it in 2024, it looks like Ceph is eating that whole segment but Ceph would obviate the need for ZFS and the rest of what TrueNAS offers, which may have contributed to why they chose Gluster.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out.

DDN has a couple of interesting QLC platforms that might be worth looking at.

There’s a few Ceph plus glue systems out there; some people like them.

I’d check out Qumulo; it might be a good fit for a streaming workload.

Thanks! Ceph should be the frontrunner here because we know it meets requirements, but my early impression from feeling it out is that it seems like RH Ceph support alone costs as much or more than just buying a complete storage appliance with support, it's depressing.

I feel like this kind of workload can't be that unusual, I'd bet that streaming video workloads look somewhat like this and there must be a segment of storage targeted at servicing video production / archive.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

in a well actually posted:

Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.

I'm not opposed to more features / better filesystems, just seeking maximum value for what is a very simple workload from the storage system's point of view. Who are the big players there?

A similar system that was on GPFS is moving to https://www.panasas.com/, who I know nothing about but I guess should be in the conversation too.

Twerk from Home fucked around with this message at 21:49 on Dec 14, 2023

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply