|
Ugh, our NetApp sales rep and reseller aren't winning any points in my book. It's not 2007 anymore, guys; this poo poo doesn't fly.
|
# ¿ May 10, 2016 20:00 |
|
|
# ¿ May 10, 2024 10:57 |
|
Vulture Culture posted:Has anyone pushed the COMSTAR iSCSI stack in Solaris/derivatives to its breaking point? How many concurrent iSCSI sessions does it tolerate on a single target before performance degradations or bad things start happening? Are there any bottlenecks I might expect to hit before tens of gigabits of network I/O becomes the relevant one? Not with ISCSI and not with Actual Solaris, but the Illumos network drivers for 10 gigabit+ NICs are pretty hit or miss.
|
# ¿ May 31, 2016 01:26 |
|
Vulture Culture posted:Do you have a naughty/nice list, or is this mostly information relayed from others? I'd be willing to entertain Linux or BSD as the target server if it got real weird, but I still trust COMSTAR more than the LIO or the new BSD iSCSI native target. Intel's is surprisingly bad (at least on the X520.) Mellanox never got beyond 10G on the ConnectX-2. Chelsio was recommended elsewhere; we've been using the T520-SO-CR and it seems to work reasonably well.
|
# ¿ May 31, 2016 03:09 |
|
Vulture Culture posted:Are you using optics, or direct-connect SFP+ twinax? I'm a little touchy around twinax after I spent 3 weeks chasing Emulex firmware bugs on my last attempt, but I need to keep costs way down on this project. Optics, but the official Chelsio optics are cheap. We've used twinax elsewhere; it's OK, but it's occasionally a crapshoot going switch->host or between different vendors' switches. Also, gently caress Emulex. If it's not something that's going to get you fired if it fails, here's an obligatory plug for http://www.fs.com . $16 for a 10Gbase-SR.
|
# ¿ May 31, 2016 15:09 |
|
No experience with their DACs. Our networking team stocks OEM DACs but fs.com optics; I'm not sure if that's just because DACs are a low-volume product for them or a result of a specific experience with a fs.com DAC. I've had fs.com cancel an order for a QSFP active optic cable in a weird configuration because they couldn't validate it in their test lab and hadn't shipped any in that configuration, so they're thinking about compatibility.
|
# ¿ Jun 6, 2016 20:38 |
|
Anybody running OEM-rebranded E-Series enclosures? Any tips?
|
# ¿ Jun 13, 2016 18:03 |
|
H110Hawk posted:Yeah, I get paying for magic. Do you know what kind of prices we're looking at? I hope you've turned rrdcache on. (The real answer is to switch to a modern tsdb.)
|
# ¿ Jul 15, 2016 03:21 |
|
Thanks Ants posted:30TB is a poo poo ton more data than people realise.
|
# ¿ Oct 6, 2016 01:50 |
|
Whenever we've gone down the ix/nexenta/tintri/(other zfs-derivative) route for quotes they always seem to be priced at 10% less than everyday netapp instead of the commodity plus 20% that id prefer to see. That may be an unrealistic expectation; ymmv.
|
# ¿ Mar 31, 2017 03:43 |
|
Vulture Culture posted:They may also legitimately have no idea how much their storage is costing them in terms of time to complete a run. They might just think it's supposed to be that slow, and they would never say anything unless a known quantity starts taking significantly longer to finish. It's worth doing an analysis to see the access patterns over time (thankfully easy on Linux with minimal setup). I'd be interested in a thumbnail sketch (which tools?) on how you are doing that analysis if you wouldn't mind sharing.
|
# ¿ Mar 31, 2017 03:49 |
|
Vulture Culture posted:I haven't worked there in a good amount of time, but sure. I'm going to ignore general stuff on managing storage performance like keeping an eye on your device queue sizes, because there's plenty of good information out there already on that. Thanks; we're already pulling per-client NFS stats into Graphite but per-mount will be more useful; we've been looking server-side (perf, network, etc.) for usage information. Lustre's generic per-client stats aren't bad but I want to start using the jobstats feature to tag each in-flight IO with a job ID. Brendan Gregg's book is good. quote:This is also true when you're looking at Gluster, Lustre, Ceph, etc. I don't get it. The offering isn't bad, but these companies never provide you the global logistics and support of the big vendors, so people buying must either be clueless or pressing them down to discount levels way lower than I was ever able to get. In our experience if you look at GB/s/$ Lustre is very difficult for NetApp/EMC to come close to, for sufficiently large values of GB/s. That's assuming your workload will actually run well on Lustre (or any other clustered FS); while it is POSIX-compliant it really isn't a general-purpose file system. Highly recommend a partner that has experience with Lustre and has a contract with Intel for L2/L3 support.
|
# ¿ Mar 31, 2017 15:42 |
|
Combat Pretzel posted:So, RDMA on Intel, i.e. iWARP, are there actually products that support it? Google leads me to believe that there are 10GbE products that do RDMA, but per Intel, none of their adapters supports it. Nope; I think they acquired a 10G NIC manufacturer who had cards that did it but it didn't make it into any of the Intel mainline NICs. IIRC Chelsio's the only one still shipping iWARP. In my limited (non iWARP) experience, they were OK? RoCE2 will do most of what iWARP did unless you really don't like Mellanox for some reason.
|
# ¿ Jun 5, 2017 22:22 |
|
For scale out performance, Isilon is fine, if a bit pricey. If you have petabyte scale needs it’d be on my shopping list. Their primary? engineers left and founded Qumulo. Similar arch. It’s a startup, so ymmv. Panasas is fine, also kinda pricey. See them in enterprise HPC usually. It might be worthwhile to talk to IBM about Spectrum Scale; they’ve been surprisingly competitive on some projects. DDN has some options that aren’t bad. People are doing interesting stuff with Ceph but that’s still pretty green. RH’s support licensing is kinda high. Nearly all of these are built around large block IO. Most parallel file systems do poorly on metadata performance.
|
# ¿ Jun 20, 2018 02:27 |
|
NVMEoF is pretty cool, and there’s some fun things being built on top of it.
|
# ¿ Jul 13, 2018 23:16 |
|
Happiness Commando posted:I have a Dell MD 3820i full of SSDs on a 10 gig network and all of my benchmarks have random writes maxing out at 45 MB/s. Two different Dell teams have looked it over and and both of them say everything is configured correctly. The escalated pro support guy told me that the performance I was seeing was expected. The pro deploy guy thought that maybe my SSDs were bad. All 20 of them, I guess. What raid config?
|
# ¿ Jul 19, 2018 00:23 |
|
Vulture Culture posted:It's 2019. Disks should load in like ammo magazines Xyratex had a fun bulk loader tool that would drop ten at a time from the shipping box into an enclosure. But why buy disks when you can do weird poo poo with SCM and QLC https://www.vastdata.com/
|
# ¿ Mar 31, 2019 16:03 |
|
Vulture Culture posted:Are you looking for any kind of replication? Because that's going to be an absolute shitshow on any platform with these file counts. Transitioning off of our ZFS-based snapshot delta replication onto an enterprise replication has been an experience...
|
# ¿ Apr 21, 2019 23:06 |
|
adorai posted:It hurts me to do it, but I can recommend oracle for this sort of product. The ZFS appliances they acquired in the Sun acquisition are pretty great. And they aren't necessarily the "gently caress you" kind of pricing Oracle is known for. They rif’d most of the remaining zfs/solaris devs last year (?) though.
|
# ¿ May 8, 2019 16:37 |
|
There’s some hdparm (?) or scsi commands to flip 512/520 sector size. I found a blog post about it like five years ago but did not save it. You’ll need at least one non raid controller with direct access to the scsi devs tho.
|
# ¿ Sep 7, 2019 23:02 |
|
Thanks Ants posted:How do they manage to make them so deep? Very carefully. (I’m looking at a few platforms well north of 40”.)
|
# ¿ Sep 25, 2019 02:38 |
|
evil_bunnY posted:If I want a half a TB of HA NFS *delivered quickly*, who should I be talking to? Netapp?
|
# ¿ Apr 6, 2021 14:47 |
|
evil_bunnY posted:We're already talking to them and Dell (Isilon). I'd like to know if I'm missing non-obvious players with EMEA presence. Thread frequently mentions Pure Storage? I really like Vast but they’re probably out of scope for 500G.
|
# ¿ Apr 6, 2021 15:40 |
|
EFS
|
# ¿ Apr 6, 2021 21:28 |
|
There’s no right number of disks for Ceph. You either have not enough disks or hosts or have way too many.
|
# ¿ Apr 15, 2021 20:20 |
|
Qumulo is fine. It’s Isilon 2.0. There are some performance edge cases to be careful with.
|
# ¿ Apr 19, 2023 20:27 |
|
Docjowles posted:I'm a little jealous of people who get to play with cool/large storage poo poo. Feel like that's easily my biggest technical blind spot. Somehow all of my jobs have been one of Very easily, lol.
|
# ¿ Apr 19, 2023 23:34 |
|
BlankSystemDaemon posted:I've been out of the enterprise storage industry for a while, which I haven't been posting much ITT - but this 30TB 2.5" U.2 Kioxia drive did catch my attention, because 40PB/rack does sound pretty good, even if the 4KB QD64 and 16KB QD32 random IOPS aren't very good, the NVMe interface still offers 2^16 queues with 2^16B each, and the sequential IOPS is pretty alright. I think the PB/1RU density is for the ruler drives; I think supermicro u.2 servers top out at like 24 per U which leaves you at a paltry 30PB per rack.
|
# ¿ Oct 20, 2023 19:42 |
|
evil_bunnY posted:Do any of you run on-prem object stores (±1PB) you're happy with? Ceph. It’s fine. Kinda fte intensive.
|
# ¿ Oct 31, 2023 02:46 |
|
Yeah. Vendored Ceph is expensive and a real pain in the rear end if you’re not in sync with mainline Ceph development.
|
# ¿ Oct 31, 2023 15:09 |
|
Vast is in scope for a PB and that performance and would be in budget but their big trick is dedupe so it might not be that great; although they have a probe to run against your data set that’s worth checking out. DDN has a couple of interesting QLC platforms that might be worth looking at. There’s a few Ceph plus glue systems out there; some people like them. I’d check out Qumulo; it might be a good fit for a streaming workload.
|
# ¿ Dec 14, 2023 21:03 |
|
Yeah, the Red Hat Ceph support is ridiculous. Big streaming IO is a pretty well understood pattern in HPC but those usually come with more consistency guarantees (and complexity) than you want.
|
# ¿ Dec 14, 2023 21:41 |
|
|
# ¿ May 10, 2024 10:57 |
|
Panasas is on the legacy side but I know some folks like them. A well designed GPFS isn’t bad but it is very network sensitive; IBM and Lenovo have products that can be very competitive. There are a few others that build GPFS platforms as well but they need to pay for their platform and the gpfs licenses. Lustre is open source and a lot of the problems people tended to have with it have been mitigated in the newer versions (2.15+), but is pretty admin intensive. DDN and HPE have solutions. The development is primarily done by DDN; AWS building a FSx on top of community Lustre has kinda taken a lot of steam out of feature development, imho. Weka comes up in conversations with sales people; they like to tier out to an object store.
|
# ¿ Dec 14, 2023 22:16 |