Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

I'm currently figuring out how to do bare metal backups on some machines. Some critical workstations, and every server (including the storage servers). And it's being backed up to a Drobo. I want to kill myself.

But it leads me to something I've been wondering for a while. What's the backup "paradigm" when everything is virtualized? I'm used to a "backup" server with a tape drive connected to it, but that's a hard work load to virtualize (due to the physical connection to the tape drive) so I'm not sure what the proper way to do it is. Should there just be another SAN, preferably offsite, that everything gets backed up to? Is there a good way to use a tape drive in a virtualized environment? With things like ZFS snapshots and Volume Shadow Copy should I not even worry about backups for file recovery, and worry about backups for disaster recovery?

# ? Nov 20, 2012 22:08

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 21:43

Rhymenoserous: May 23, 2008

FISHMANPET posted:

I'm currently figuring out how to do bare metal backups on some machines. Some critical workstations, and every server (including the storage servers). And it's being backed up to a Drobo. I want to kill myself.

But it leads me to something I've been wondering for a while. What's the backup "paradigm" when everything is virtualized? I'm used to a "backup" server with a tape drive connected to it, but that's a hard work load to virtualize (due to the physical connection to the tape drive) so I'm not sure what the proper way to do it is. Should there just be another SAN, preferably offsite, that everything gets backed up to? Is there a good way to use a tape drive in a virtualized environment? With things like ZFS snapshots and Volume Shadow Copy should I not even worry about backups for file recovery, and worry about backups for disaster recovery?

Here's how I do it. Or I should say how it's going to be done when I'm all finished.

I'm running 4 ESXi hosts running over a 10GE connection for my storage, the hosts are in a DRM cluster so if any one host goes down things automigrate around to bring everything back online. Entirely hands off, I love it.

The 10GE storage links to my Nimble storage device which holds all of my "On site snaps" depending on the storage pool I'm keeping everything from hourly snapshots to daily, and I generally keep a month or so worth laying around again dependent on app.

Offsite I'll have a second storage appliance from nimble that is going to be catching replica's from the primary(this is an array function), and two ESXi hosts that are basially just waiting for me to bring VM's online. I'm keeping the DR site fairly cheap though so I'm unlikely to use DRM, or even use vcenter at all there.

I've been thinking about veaam but it's really getting poo poo reviews lately.

# ? Nov 20, 2012 22:23

sanchez: Feb 26, 2003

If you have 2 SANs that'll do application level snapshots and replication I don't see the point of Veeam, you've got a much better solution already. We do the same thing with some Equallogic PS series.

# ? Nov 20, 2012 22:37

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

sanchez posted:

If you have 2 SANs that'll do application level snapshots and replication I don't see the point of Veeam, you've got a much better solution already. We do the same thing with some Equallogic PS series.

Until you hit a firmware bug that trashes your filesystem and replicates the changes downwind.

# ? Nov 20, 2012 23:19

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Misogynist posted:

Until you hit a firmware bug that trashes your filesystem and replicates the changes downwind.

That's why you use SnapVault, since it uses a separate file table/metadata!

# ? Nov 21, 2012 00:20

GrandMaster: Aug 15, 2004; laidback

Misogynist posted:

Until you hit a firmware bug that trashes your filesystem and replicates the changes downwind.

Depends on your replication system.. We run EMC Recoverpoint, it journals all of the replication block changes and we can roll back to any point in time on the remote system.

# ? Nov 21, 2012 00:33

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

I sure hope you have a fast link if you're using replication as your backup.

# ? Nov 21, 2012 00:46

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

the spyder posted:

I racked one of my new internal 336TB (raw) ZFS SAN's this week and realized that my field engineers are not configuring hot spares.

My question is what drive groupings would you use for such a large storage pool?
We currently use Raidz2 with 7 disk sets (16). I configured this one with Raidz2, 6 disk sets (18) and 4 hot spares (one per JBOD.)

How many controllers? How many CPUs? How many heads? What do the paths from controller to enclosure to enclosure look like? Do you have a SSD ZIL? What does your workload look like? What are your availability requirements? SAS or SATA?

We've run as few as 6 disks and as many as 45 disks in vdevs.

# ? Nov 21, 2012 01:04

evil_bunnY: Apr 2, 2003

three posted:

I sure hope you have a fast link if you're using replication as your backup.

10GB metro ownsownsowns.

# ? Nov 21, 2012 01:14

sanchez: Feb 26, 2003

Misogynist posted:

Until you hit a firmware bug that trashes your filesystem and replicates the changes downwind.

Replication is not realtime, so if the primary san ate itself but was still in a state where it would try to replicate, we'd have 6 hours or so to intervene. Realtime replicas would make me nervous for the reason you mentioned.

# ? Nov 21, 2012 02:03

Pile Of Garbage: May 28, 2007

the spyder posted:

I racked one of my new internal 336TB (raw) ZFS SAN's this week and realized that my field engineers are not configuring hot spares.

I feel your pain. Some time ago at my previous employer I found that one of my colleagues wasn't configuring hot spares and, to make matters worse, was not configuring automatic alerting on the SAN. This feat of retardation was discovered when two HDDs in a RAID5 array on the SAN died taking which took a whole LUN offline. We only discovered the failure by accident 3 months after it had happened and miraculously no production systems were harmed (The LUN was a VMFS datastore and the VMs on that datastore were not in production).

It shits me to tears when people just piss all over best practice. I had a few choice words to say to my colleague after that incident.

# ? Nov 21, 2012 02:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

cheese-cube posted:

I feel your pain. Some time ago at my previous employer I found that one of my colleagues wasn't configuring hot spares and, to make matters worse, was not configuring automatic alerting on the SAN. This feat of retardation was discovered when two HDDs in a RAID5 array on the SAN died taking which took a whole LUN offline. We only discovered the failure by accident 3 months after it had happened and miraculously no production systems were harmed (The LUN was a VMFS datastore and the VMs on that datastore were not in production).

It shits me to tears when people just piss all over best practice. I had a few choice words to say to my colleague after that incident.

Good to see management also wasn't auditing the monitoring on your tier 1 systems.

# ? Nov 21, 2012 06:29

Pile Of Garbage: May 28, 2007

Misogynist posted:

Good to see management also wasn't auditing the monitoring on your tier 1 systems.

Well management wasn't actually auditing anything at all which is part of the reason I left that outfit.

# ? Nov 21, 2012 09:05

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Management... auditing? What is this madness?

# ? Nov 21, 2012 15:26

Rhymenoserous: May 23, 2008

three posted:

Management... auditing? What is this madness?

Auditing? But I pay my taxes...

# ? Nov 21, 2012 15:34

bort: Mar 13, 2003

Sardines whatsley?

# ? Nov 21, 2012 16:31

the spyder: Feb 18, 2011

evil_bunnY posted:

Max recommended vdev size is 9 isn't it?

I personally have been sticking to Gea's recommendations.

4+ parity (1 or two based on Z or Z2)
8+ parity
16+ parity

# ? Nov 21, 2012 17:59

the spyder: Feb 18, 2011

PCjr sidecar posted:

How many controllers? How many CPUs? How many heads? What do the paths from controller to enclosure to enclosure look like? Do you have a SSD ZIL? What does your workload look like? What are your availability requirements? SAS or SATA?

We've run as few as 6 disks and as many as 45 disks in vdevs.

One controller, Dual x5650's, 192GB ram, Mellanox Dual port 40Gb IB. Second cold spare racked, identical config. Two LSI HBA's, (2ext ports each) four cables ran to
Four JBODS (despite having dual expander backplanes, we don't use them.
28 Disk each. No SSD ZIL. (I know, I know. Not my design- changing it on the next revision
to include a fusion IO Write/Zil SSD's.

Workload is: Giant NAS. Used for storage of large (1TB) datasets comprising of 1000's of ~30mg files until moved to processing. Storing processed info for ~ XX days and then rinse/repeat. Currently have ~20 of these deployed.

Uptime is not written in to any of our contracts. It going down is not good for our field guys.
Sata due to price.

Thinking of switching for now to a RaidZ2 8+2 config with 2 hot spares. I have not had these deployed long enough to get lifespan figures, but our stuff gets beatup, quickly due to the environment it is in.

# ? Nov 21, 2012 18:12

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

Misogynist posted:

The Windows support for NFS serving is incredibly poo poo, though, and good luck doing access control unless you're doing a full Centrify implementation.

THIS. And I'd add that unless you go with Server 2012 and all it brings to the table - eg not being able to run ANY 2012 RSAT tool on Windows 7 - your SMB/CIFS speed will be also a pile of poo poo compared to a dedicated/non-Windows unit.

# ? Nov 21, 2012 19:13

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

three posted:

I sure hope you have a fast link if you're using replication as your backup.

Semi-related: people using replication are saying the new Equallogic 6.0 firmware (the one that brought us synchronous replication) have indeed fixed the bandwidth bug and they are seeing 100Mb/1Gb links saturated... anyone here can confirm this?

# ? Nov 21, 2012 19:17

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

three posted:

I sure hope you have a fast link if you're using replication as your backup.

Well, fast is all relative, but with block level change replication, deduplication, and compression you can get a lot of backup into relatively little pipe. Especially if your RPO isn't that aggressive.

On the subject of backing up virtual environments the option to go to tape is still there with NDMP, though this is pretty space inefficient. Some backup vendors can natively understand array level snapshots for certain vendors and back up only the changes, which makes for an excellent space efficient long term backup solution. Of course, in the case of NetApp that vendor is Syncsort and I'm not a fan of their product. Not sure what's out there for other vendors.

One option regarding backup of virtual environments is just to not do it. I don't mean stop backing up data entirely, but rather build out your applications in such a way that backup is not required for them, or that less backup is required. With Exchange 2010 you can do this with multiple DB copies, lag DBs and the recycle bin, provided you have enough storage, which is something we are investigating doing at my customer.

With SQL 2012 also moving towards a DAG model my guess is that Sharepoint will eventually allow for the same. I'm not incredibly familiar with Oracle but I wonder if it would be possible to leverage DataGuard and the Oracle recycle bin to minimize the amount of backup required.

If you can utilize application level features to get your backup requirement down to once a week or even once a month that's a pretty big deal. Even with modern SANs doing array level snapshots basically for free, the less time spent doing backups the better.

# ? Nov 21, 2012 22:16

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

the spyder posted:

One controller, Dual x5650's, 192GB ram, Mellanox Dual port 40Gb IB. Second cold spare racked, identical config. Two LSI HBA's, (2ext ports each) four cables ran to
Four JBODS (despite having dual expander backplanes, we don't use them.
28 Disk each. No SSD ZIL. (I know, I know. Not my design- changing it on the next revision
to include a fusion IO Write/Zil SSD's.

Workload is: Giant NAS. Used for storage of large (1TB) datasets comprising of 1000's of ~30mg files until moved to processing. Storing processed info for ~ XX days and then rinse/repeat. Currently have ~20 of these deployed.

Uptime is not written in to any of our contracts. It going down is not good for our field guys.
Sata due to price.

Thinking of switching for now to a RaidZ2 8+2 config with 2 hot spares. I have not had these deployed long enough to get lifespan figures, but our stuff gets beatup, quickly due to the environment it is in.

8+2 w/ 2 hot spares seems reasonable; you may want to keep a few cold spares on hand. You may want to stripe each vdev across the controllers. For the ZIL, we're switching to ZeusRAM on our new servers.

# ? Nov 21, 2012 23:37

Syano: Jul 13, 2005

NippleFloss posted:

Well, fast is all relative, but with block level change replication, deduplication, and compression you can get a lot of backup into relatively little pipe. Especially if your RPO isn't that aggressive.

On the subject of backing up virtual environments the option to go to tape is still there with NDMP, though this is pretty space inefficient. Some backup vendors can natively understand array level snapshots for certain vendors and back up only the changes, which makes for an excellent space efficient long term backup solution. Of course, in the case of NetApp that vendor is Syncsort and I'm not a fan of their product. Not sure what's out there for other vendors.

One option regarding backup of virtual environments is just to not do it. I don't mean stop backing up data entirely, but rather build out your applications in such a way that backup is not required for them, or that less backup is required. With Exchange 2010 you can do this with multiple DB copies, lag DBs and the recycle bin, provided you have enough storage, which is something we are investigating doing at my customer.

With SQL 2012 also moving towards a DAG model my guess is that Sharepoint will eventually allow for the same. I'm not incredibly familiar with Oracle but I wonder if it would be possible to leverage DataGuard and the Oracle recycle bin to minimize the amount of backup required.

If you can utilize application level features to get your backup requirement down to once a week or even once a month that's a pretty big deal. Even with modern SANs doing array level snapshots basically for free, the less time spent doing backups the better.

It can be confusing sometimes to sort this out. Like a lot of SMBs that grew very rapidly we are currently trying to wrap our head around a new backup and recovery paradigm. When I started at my current gig we had 3 guys and no centralized storage. Now we have doubled the size of our team and we find ourselves with a hodgepodge of storage including HP lefthand, dell powervault and even openfiler. We still do all our backups using what I call the oldschool method. We have nightly Backup Exec jobs that move through our systems and grab incrementals with full backups on the weekends then we spin weeklys and monthlys off to LTO tape. We played with Veeam for a while but found their support to be horrible and abandoned it pretty fast.

We are trying to figure out how exactly to move in to a new way of thinking about things. Options we have considered building a new Hyper-V cluster using 2012 to replicate VMs to other hyper-v hosts. We have also considered consolidating to a single vendor for storage like equallogic or netapp and taking advantage of snapshots and replication features for backup purposes, but since none of us have used kits like that before we are sort of at a loss as to exactly how it would work. Since 90 percent of our workloads are virtualized we keep going back to giving Veeam another shot and really building out the infrastructure for it correctly but then I see some of you guys mention not ever needing something like veeam and I wonder how exactly you are doing what you are doing. Anyways, on that subject if some of you didnt mind putting some input in on the way you are doing things I would appreciate it.

# ? Nov 22, 2012 01:48

the spyder: Feb 18, 2011

PCjr sidecar posted:

8+2 w/ 2 hot spares seems reasonable; you may want to keep a few cold spares on hand. You may want to stripe each vdev across the controllers. For the ZIL, we're switching to ZeusRAM on our new servers.

We keep 10 cold spares per 336TB system. Interesting, I will look in to them, thanks!
I use eMLC on my home box for ZIL.

# ? Nov 22, 2012 05:10

evil_bunnY: Apr 2, 2003

Our backups are all through the university's Tivoli system. You point it at your poo poo and it does the rest, mostly (it's someone else's problem once the bits are offsite).

The problems are:

- doesn't like big (10+TB) jobs.
- expensive as gently caress, charged per GB and we're using around 50TB
- no dedup on the backend exacerbates problem #2.

I've priced a replication target for our SAN, and it's half what we're paying now over 3 years (including power/cooling), even less over 5. And my pals in the department where it'll live can add a shelf and use it for production.

# ? Nov 22, 2012 11:29

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

doesn't like big (10+TB) jobs.

We've got a 300 TB GPFS filesystem backing up without issue. There are client settings regarding memory utilization that you'll need to change if you want to back up a single fileset at that scale.

evil_bunnY posted:

expensive as gently caress, charged per GB and we're using around 50TB

Are you using TSM to back up student laptops and stuff? If not, you're probably licensed wrong. You should be licensed per server using PVUs instead.

evil_bunnY posted:

no dedup on the backend exacerbates problem #2.

TSM does dedupe already regardless of what your backend supports. Are you on 6.3?

# ? Nov 22, 2012 19:59

evil_bunnY: Apr 2, 2003

Backups aren't my thing really so I haven't looked into the why of the jobs failing, I was only told they did until they got split. I suppose this is what you're talking about. I'll check when I get a minute.

The expense comes because this is a service we get from the central university IT dept. A pox on their house. I don't deal with students.

TSM may do do dedup but we sure are getting charged for data usage before dedup. It's awesome. I'm getting a replication filer greenlighted next week hopefully.

We are indeed on 6.3

# ? Nov 22, 2012 23:16

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Syano posted:

...backup stuff...

Obviously a lot of this depends on your budget and RPO/RTO requirements, but my general recommendations for virtual environments are:

1) Pick a single storage vendor and stick with them. It makes management easier if your admins only have to learn to manage one technology and most vendors do things like array-to-array replication that can be used to enable DR or off-site backup.

2) Try to pick storage that has enough application integration to meet your backup needs. Most vendors that do array based snapshots have software that integrates with things like VMWare, Exchange, SQL, Oracle, and other applications to provide application consistent backups using array level snapshots. These backups will generally be very fast, they won't put extra load on the systems, and they won't impact array performance (YMMV depending on your vendor).

3) Do backups to tape only as often as required to meet your retention requirements. Tape backups create contention for disk because they stream data at a high rate. In a virtual environment this can be a problem since your VMs are all generally sharing the same disk so backing up one host causes contention for all other hosts sharing that disk. It also causes contention for resources on the ESX host. So tape backup should generally be a last resort. Rather than dumping things to tape try to buy a second array and use your vendor's replication features to create long term or off-site backups. Array level replication is generally going to be much more efficient than tape backup regarding how much data needs to be read from disk.

4) I like unified SANs a lot because they allow you to consolidate file services directly onto the device and decommission servers. On some vendors you can consolidate CIFS shares onto the device, take array snapshots, and then restore from those snapshots using previous versions functionality in Windows, or by drilling down into a hidden directory. Ditto for NFS. This is great because it allows users to initiate their own restores and you don't have file servers to back up. The fact that unstructured user data usually deduplicates well is another benefit to consolidating it onto your storage.

As more and more data protection features make their way onto storage arrays it makes less and less sense to maintain an expensive and fickle tape backup environment. Disk to disk backup is becoming cheaper and easier, and it's comforting to know that your backups are protected by raid, and block checksums, and scrubs, and the other nice protection and integrity features provided by your array. And if you have to validate backups as a standard procedure it's much faster doing so from disk than tape.

# ? Nov 23, 2012 21:27

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

Interesting rumor: http://www.theregister.co.uk/2012/11/19/dell_afa/

Thoughts? Despite my reservation about Nimbus (see my comments there) I'd say it is still far the best price/IP ratio for Dell - question is if they want to commit enough money to make it fully integrated into their storage product line (a.k.a. enterprise-ready)...

# ? Nov 26, 2012 22:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

szlevi posted:

Interesting rumor: http://www.theregister.co.uk/2012/11/19/dell_afa/

Thoughts? Despite my reservation about Nimbus (see my comments there) I'd say it is still far the best price/IP ratio for Dell - question is if they want to commit enough money to make it fully integrated into their storage product line (a.k.a. enterprise-ready)...

I'm not really sold on the need for an all-flash array for 99.9% of consolidated storage customers. Is anyone using this? For what? What workloads out there do you have that need 1 million random read IOPs with a total capacity of like 5TB, at a price multiple of many times more than a traditional storage array with 10 times the capacity?

This is an sincere question. SSD doesn't provide much benefit over HDD for throughput based workloads, if you already acknowledge writes in NVRAM then it won't accelerate writes, and most vendors have some sort of flash based read caching that serves hot blocks or segments from flash...so where does an all flash array fit?

# ? Nov 27, 2012 08:14

madsushi: Apr 19, 2009; Baller.
#essereFerrari

NippleFloss posted:

I'm not really sold on the need for an all-flash array for 99.9% of consolidated storage customers. Is anyone using this? For what? What workloads out there do you have that need 1 million random read IOPs with a total capacity of like 5TB, at a price multiple of many times more than a traditional storage array with 10 times the capacity?

This is an sincere question. SSD doesn't provide much benefit over HDD for throughput based workloads, if you already acknowledge writes in NVRAM then it won't accelerate writes, and most vendors have some sort of flash based read caching that serves hot blocks or segments from flash...so where does an all flash array fit?

I have been asking myself the same question. I think that your primary use-case is going to be someone with a relatively small but intensely used database that's too big to fit in a FlashCache card (multiple TB).

Essentially, when looking at your disk types, you're actually looking at a chart of IOPS/GB. Your 450GB 15K drive is going to give you like 0.38 IOPS/GB, while your 1TB 7.2K SATA drive will give you much less, around 0.05 IOPS/GB. SSD/Flash arrays are going to give you a massive IOPS/GB value, but that has to be compared against the IOPS/GB that the app/environment requires. The only time I see IOPS/GB needing to scale past 0.4 IOPS/GB, you're looking at intensely used databases.

# ? Nov 27, 2012 08:23

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

NippleFloss posted:

I'm not really sold on the need for an all-flash array for 99.9% of consolidated storage customers. Is anyone using this? For what? What workloads out there do you have that need 1 million random read IOPs with a total capacity of like 5TB, at a price multiple of many times more than a traditional storage array with 10 times the capacity?

This is an sincere question. SSD doesn't provide much benefit over HDD for throughput based workloads, if you already acknowledge writes in NVRAM then it won't accelerate writes, and most vendors have some sort of flash based read caching that serves hot blocks or segments from flash...so where does an all flash array fit?

madsushi posted:

I have been asking myself the same question. I think that your primary use-case is going to be someone with a relatively small but intensely used database that's too big to fit in a FlashCache card (multiple TB).

Essentially, when looking at your disk types, you're actually looking at a chart of IOPS/GB. Your 450GB 15K drive is going to give you like 0.38 IOPS/GB, while your 1TB 7.2K SATA drive will give you much less, around 0.05 IOPS/GB. SSD/Flash arrays are going to give you a massive IOPS/GB value, but that has to be compared against the IOPS/GB that the app/environment requires. The only time I see IOPS/GB needing to scale past 0.4 IOPS/GB, you're looking at intensely used databases.

Our workload - high-end medical visualizations, compositing, working with 4K and higher datasets, volume rendering etc - is a classic example... frame sizes start around 16MB and regularly reach 128MB or even higher and I need at least 10-15 fps (over ~1GB/s) per workstation so that's at least one of Nimbus' boxes IF it is indeed able to deliver 10-12GB/box - latter needs to be tested, we are talking about the shittiest protocol (CIFS/SMB2.x), RL is always worse than specs on paper.
That being said their initial $25k-30k/box wasn't bad, it's just their CEO's pushy and arrogant sales guy style blocked the demo: the paperwork said I would have to pay full price even in case of a failed test if the box gets back to their office in CA after the 15th day (we are in NYC, just think about shipping times and cost!), not to mention sanctioning a mandatory purchase if tests results judged good... when I asked him to remove these ridiculous terms because I cannot commit to anything before I *KNOW* it is the right solution (I explained him that once I know I will sit down with my CFO and CEO and request the budget for the whole upgrade including his boxes) he immediately pulled out of the deal, citing (in an entertainingly patronizing tone) 'difficulties to justify the cost of supporting the evaluation', essentially admitting they do not have money to float a couple of demo unit around (and ours would've been an OLDER, small-sized unit.)

szlevi fucked around with this message at 19:45 on Nov 27, 2012

# ? Nov 27, 2012 19:13

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

cheese-cube posted:

Oh I see what they're saying, despite the stupid names they've used (Although that's probably just my completely irrational Dell hatred talking). I've worked with IBM SVC (SAN Volume Controller) before which which does the same sort of thing (They call it "external storage virtualisation").

edit: vvv ahahahaha love it

This has nothing to do with Dell, every storage vendor I've talked to in the past 3-4 years called the old 'controller-plus-shelves-of-disk' setups "frame-based".
Similarly every vendor called SAN systems that scale horizontally by simply multiplying the number of boxes (each with its own subsystem including controller/CPU/disks/whatever) "scale-out".
And no, SVC is not scale-out at all, it's a storage virtualization product.

In short it's just you, not Dell.

# ? Nov 27, 2012 20:00

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

szlevi posted:

Our workload - high-end medical visualizations, compositing, working with 4K and higher datasets, volume rendering etc - is a classic example... frame sizes start around 16MB and regularly reach 128MB or even higher and I need at least 10-15 fps (over ~1GB/s) per workstation so that's at least one of Nimbus' boxes IF it is indeed able to deliver 10-12GB/box - latter needs to be tested, we are talking about the shittiest protocol (CIFS/SMB2.x), RL is always worse than specs on paper.
That being said their initial $25k-30k/box wasn't bad, it's just their CEO's pushy and arrogant sales guy style blocked the demo: the paperwork said I would have to pay full price even in case of a failed test if the box gets back to their office in CA after the 15th day (we are in NYC, just think about shipping times and cost!), not to mention sanctioning a mandatory purchase if tests results judged good... when I asked him to remove these ridiculous terms because I cannot commit to anything before I *KNOW* it is the right solution (I explained him that once I know I will sit down with my CFO and CEO and request the budget for the whole upgrade including his boxes) he immediately pulled out of the deal, citing (in an entertainingly patronizing tone) 'difficulties to justify the cost of supporting the evaluation', essentially admitting they do not have money to float a couple of demo unit around (and ours would've been an OLDER, small-sized unit.)

It still sounds like this is a heavily sequential workload which doesn't utilize the particular strengths of flash. And with frame sizes that large I'd imagine capacity would be a concern unless you're immediately vaulting most of this data onto slower storage after it's viewed once or twice. It's not my forte, but NetApp E-series is build for almost exactly this. It does very high ingest in a small form factor, and does it on spinning platters so you don't end up at huge $/TB multiple.

That's not a sales pitch, I can't recommend E-series since I've never worked with it, I'm just pointing out that in throughput bound workloads there's really nothing particularly compelling about flash or SSD. You're going to get *maybe* twice the throughput per drive that you would from spinning platters, and you're going to pay a lot more for less capacity and likely use usability, given how new these all-flash vendors are.

madsushi posted:

I have been asking myself the same question. I think that your primary use-case is going to be someone with a relatively small but intensely used database that's too big to fit in a FlashCache card (multiple TB).

Essentially, when looking at your disk types, you're actually looking at a chart of IOPS/GB. Your 450GB 15K drive is going to give you like 0.38 IOPS/GB, while your 1TB 7.2K SATA drive will give you much less, around 0.05 IOPS/GB. SSD/Flash arrays are going to give you a massive IOPS/GB value, but that has to be compared against the IOPS/GB that the app/environment requires. The only time I see IOPS/GB needing to scale past 0.4 IOPS/GB, you're looking at intensely used databases.

That's one scenario, though it can be really easy to screw this up. People assume that if they throw flash at a problem they don't really need to do proper sizing, so you'll see customers who end up throwing their database on flash and assuming it will work well. When it performs like crap they are really confused, until an engineer comes in and tells them that flash isn't magical, and they can't put their heavily sequential transaction logging for their very busy database on only 6 flash drives and expect it to perform well. And that would have to be a drat big database, to not fit into the terabytes of cache that you can throw into most SANs.

Honestly, though, I think the big appeal is that people think flash obviates the need for proper sizing. "It can do 1-million IOPs, so no need to figure out what your actual workload is, because this can handle it no matter what".

# ? Nov 27, 2012 20:13

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

Bitch Stewie posted:

It's iSCSI not NFS but at a basic level a commodity Dell/HP running ESXi and a HP P4000 VSA doesn't sound a million miles off, though it only scales so far.

Well, the VSA (a Proliant server, that is) could also run the hypervisor, giving us an almighty all-in-one building block... imagine deploying 10-15 of these cheap boxes, using a laptop to run the orchestration software (can be later also run in a VM)...

# ? Nov 27, 2012 21:23

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

NippleFloss posted:

It still sounds like this is a heavily sequential workload which doesn't utilize the particular strengths of flash. And with frame sizes that large I'd imagine capacity would be a concern unless you're immediately vaulting most of this data onto slower storage after it's viewed once or twice. It's not my forte, but NetApp E-series is build for almost exactly this. It does very high ingest in a small form factor, and does it on spinning platters so you don't end up at huge $/TB multiple.

That's not a sales pitch, I can't recommend E-series since I've never worked with it, I'm just pointing out that in throughput bound workloads there's really nothing particularly compelling about flash or SSD. You're going to get *maybe* twice the throughput per drive that you would from spinning platters, and you're going to pay a lot more for less capacity and likely use usability, given how new these all-flash vendors are.

Never seen anything economically feasible that would support this kind of bandwidth built on spinning disks (and I run an older 80-disks DDN S2A9550 here, topping at ~1.3GB/s, heh)...

...but Nimbus' building block is a 24-drive SSD-only unit, without the usual backend limitations (again, they designed & mfr their own drives, cutting out the controller, protocol etc fat) so we're talking about a LOT of bandwidth in a 2U box, giving me 5-10TB per box, for $30k and up. Say nI only get half of the bandwidth per box, around 5GB/s - that means 10GB/s will cost me the price of 2 boxes or, heck, calculate with 3 boxes...

...in comparison what do you think, how many spinning disk you need to sell me for that kind of bandwidth and at what price?

# ? Nov 28, 2012 00:02

Nomex: Jul 17, 2002; Flame retarded.

With Netapp you can have a large aggregate of spinning disk with some SSDs mixed in for cache. Blocks are moved up into the flash pool when they get busy, then get de-staged when they stop being accessed. It works kinda like their PAM cards, only it's read/write and aggregate specific rather than system wide.

Another way you might want to approach it is to use something like an HP DL980 with 14 Fusion IO cards and whatever your favorite flavor of virtual storage appliance is.

How big is your active dataset per workstation?

Nomex fucked around with this message at 01:46 on Nov 28, 2012

# ? Nov 28, 2012 01:38

Vanilla: Feb 24, 2002; Hay guys what's going on in th

szlevi posted:

Never seen anything economically feasible that would support this kind of bandwidth built on spinning disks (and I run an older 80-disks DDN S2A9550 here, topping at ~1.3GB/s, heh)...

...but Nimbus' building block is a 24-drive SSD-only unit, without the usual backend limitations (again, they designed & mfr their own drives, cutting out the controller, protocol etc fat) so we're talking about a LOT of bandwidth in a 2U box, giving me 5-10TB per box, for $30k and up. Say nI only get half of the bandwidth per box, around 5GB/s - that means 10GB/s will cost me the price of 2 boxes or, heck, calculate with 3 boxes...

...in comparison what do you think, how many spinning disk you need to sell me for that kind of bandwidth and at what price?

I recall you mentioning isilon at some point in the past. Is it the cost that rules them out or a technical reason?

# ? Nov 28, 2012 01:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

szlevi posted:

Never seen anything economically feasible that would support this kind of bandwidth built on spinning disks (and I run an older 80-disks DDN S2A9550 here, topping at ~1.3GB/s, heh)...

...but Nimbus' building block is a 24-drive SSD-only unit, without the usual backend limitations (again, they designed & mfr their own drives, cutting out the controller, protocol etc fat) so we're talking about a LOT of bandwidth in a 2U box, giving me 5-10TB per box, for $30k and up. Say nI only get half of the bandwidth per box, around 5GB/s - that means 10GB/s will cost me the price of 2 boxes or, heck, calculate with 3 boxes...

...in comparison what do you think, how many spinning disk you need to sell me for that kind of bandwidth and at what price?

I'd be absolutely shocked if they can get their stated 12GB/s out of only 24 SSD drives. Assuming the data is protected at all you're losing at least a couple of those to parity drives, which means each SSD is doing more than 500 MB/s. You might get that out of a consumer quality SSD, but enterprise class SSDs only hit around 300 to 350 MB/s. There just aren't enough disks to get you to those throughput numbers with reliable hardware. It's possible that they are using consumer grade drives, but that would worry me, especially with a new vendor with no real track record. They make a lot of claims that border on magic, like 10 years of 1.2PB writes a weak without any performance loss, which just doesn't fit with the characteristics of SSD drives as they exist right now, especially not consumer grade drives. Could be true, but will they be around in 10 years to verify?

I know a E5460 box from NetApp can do about 3GB/s (this may be higher, theoretically, this is just the number I've seen when sizing for Lustre with a certain block size and stream count) in a 4u enclosure that includes redundant controllers and 60 7.2k 2/3TB NL-SAS drives. That'll get you around 80TB, give or take, with raid-6 and 2TB drives. I've got no idea on price though, since, as I said, I don't support these at all. Could be cheap of very expensive. It's probably less than the $8/GB raw that Nimbus gear lists at, but whether you need the extra capacity is another matter.

Anyway, my point wasn't that all flash arrays are bad, I'm just trying to understand who is using them and why. If your requirements are for a very high throughput low capacity solution in a small footprint then it might be the right move for you. SSD throughput is only about double spinning drive throughput, at best, but that might be enough difference to get to your magic number.

Vanilla posted:

I recall you mentioning isilon at some point in the past. Is it the cost that rules them out or a technical reason?

My guess is that Isilon would way too expensive and require too much gear to the to the 5 GB/s number he mentioned. I'd guess he'd be looking at 10 nodes, at minimum, to get to that number.

YOLOsubmarine fucked around with this message at 02:48 on Nov 28, 2012

# ? Nov 28, 2012 02:34

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 21:43

szlevi: Sep 10, 2010; [[ POKE 65535,0 ]]

Nomex posted:

With Netapp you can have a large aggregate of spinning disk with some SSDs mixed in for cache. Blocks are moved up into the flash pool when they get busy, then get de-staged when they stop being accessed. It works kinda like their PAM cards, only it's read/write and aggregate specific rather than system wide.

Another way you might want to approach it is to use something like an HP DL980 with 14 Fusion IO cards and whatever your favorite flavor of virtual storage appliance is.

How big is your active dataset per workstation?

Yeah, that 'moving blocks'/tiering approach never worked, never will for this type of thing, I can tell you that already.

As for being sequential - it's only per file but you have several users opening and closing plenty of different files etc so it's far from the usual video editing load.
Size can vary from few gigabytes to 50-100GB per file (think of 4k and higher, RGBA fp32 etc) - we've developed our own raw format, wrote plugins for Fusion, Max etc so we're actually pretty flexible if it comes down to that...

FWIW I have 2 Fuison-IO Duo cards, they were very fast when I bought them for $15k apiece, now they are just fast but the issue from day 1 is Windows CIFS: up to 2008 R2 (SMB2.1) CIFS is an utter piece of poo poo, it simply chops up everything into 1-3k size pieces so it pretty much destroys any chance of taking advantage of its bandwidth.
Just upgraded my NAS boxes (Dell NX3000s) to Server 2012, I'll test SMB3.0 with direct FIO shares again - I'm sure it's got better but I doubt it's got that much better...

Since going with an established big name would be very expensive (10GB/s!) as I see I have to choose between two approaches:
1. building my own Server 2012-based boxes eg plugging in 3-4 2GB/s or faster PCIe storage cards, most likely running the shebang as a file sharing cluster (2012 got a new active-active scale-out mode), hoping SMB3.0 got massively better
2. going with some new solution, coming from a new, small firm, hoping they will be around or bought up - and only, of course, after acquiring a demo unit to see real performance

I can also wait until Dell etc buys up a company/rolls out something new but who knows when they will have affordable 10GB/s...?

szlevi fucked around with this message at 21:52 on Nov 28, 2012

# ? Nov 28, 2012 21:50

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »