Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

evil_bunnY: Apr 2, 2003

Any of you boys and girls run Nexenta?

# ? Apr 14, 2016 12:08

Adbot: ADBOT LOVES YOU

# ? May 26, 2024 14:01

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

Any of you boys and girls run Nexenta?

I tried it once, but it was just really expensive compared to running your own ZFS at slightly more effort. I wish the pricing structure had been saner.

# ? Apr 14, 2016 14:22

evil_bunnY: Apr 2, 2003

Vulture Culture posted:

I tried it once, but it was just really expensive compared to running your own ZFS at slightly more effort. I wish the pricing structure had been saner.

That's what I was afraid of but their (first) quote is actually super reasonable. They're scheduling a demo of the vmware integration and mgmt tools for us so I can report on that in a little bit. How'd you like it?

# ? Apr 14, 2016 14:44

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Number19 posted:

https://kb.netapp.com/support/index?page=content&id=9010080

So NetApp stuff will need patches.

It sounded like a RCE against SMB which would have been super exploitable in awful ways. Even if they're not internet facing, it's easy for a malware author to drop something in your network and do terrible things like have access to these systems to launch ransomware attacks or other data exfiltration. As it is the MITM can be used to change file permissions on Samba servers (which most NAS storage vendors implement) meaning that some very bad things still can happen from this.

For Windows this is boring. For Samba integrators? It might be a lot worse.

NetApp doesn't use Samba, they actually license some of the CIFS code from MS, so it's not purely a Samba thing.

# ? Apr 14, 2016 16:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

evil_bunnY posted:

Our current stuff is a pair of Netapp 2240�s with 5TB of 15k drives for VM datastores and 40TB of nearline.

We need ~200TB of nearline and ~15TB for VM datastores, NAS head(s) for at least SMB (current datastore is on NFS, but I�m not married to that), on 10GBE. Our current solution is mainly bottlenecked at the controllers (WAFL writes/CIFS-SMB CPU usage). We ingest of bunch of raw sequencing data on the regular, so being able to scale out to around a PB before EOL would be cool.

I like how the netapp boxes can just serve all the protocols we need, but I have how CPU bottlenecked we are on WAFL writes. Also I'm a grumpy fart and know nothing of CDOT.

Anyone in particular we should talk to? Netapp will show us their plan, Dell is coming soon.

Tegile could do what we want but I don't know how sustainable they are.
Pure is flash only.
Nimble is iSCSI only.

Your CPU bottlenecked in part because the 2240s are dual core systems. Get on an 8000 with some newer code and you'll be fine.

Nimble would be your best bet out of those for data ingest rate at a reasonable price, though you'd obviously need to front end with a windows server for CIFS. Tegile's CIFS implementation is problematic and they still don't have SMB3 support fully baked. They also lose a lot of drives to parity since mirroring rather than raid-5/6 is basically required for any performance, so total throughput is limited on the base array since you only have a handful of drives actively writing.

CDOT is different, but once it's up and running you won't really notice much difference between running it and running a 7-mode system.

# ? Apr 14, 2016 16:57

evil_bunnY: Apr 2, 2003

I totally believe throwing bigass controllers at my shelves would fix the issues, we were kind of forced into a worst case scenario wrt ingestion rates.

The reason I'm looking at other options is because by the time I replace my EoL controllers and shelves, I might as well swap the whole thing if Netapp doesn't want to play ball on pricing. The only thing I don't like about it are the management tools.

Thanks for the input dudes, you two are always v insightful.

# ? Apr 14, 2016 17:49

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

evil_bunnY posted:

The only thing I don't like about it are the management tools.

In 8.3 system manager is on box and HTML5, so no downloading a client and dealing with java at least. It's still not as simple as Nimble or Tegile to manage, but it's hard to get there with NetApp given all of the capabilities.

Another knock against Tegile is they really have no scale out capability to speak of, which means you're going to end up managing a handful of arrays independently if you do need to scale to 1PB eventually.

Nimble can scale out to four arrays with volumes striped across them and NetApp can scale out the cluster to 8 nodes, though no stripes volumes at this point.

# ? Apr 14, 2016 18:15

Aquila: Jan 24, 2003

https://sadlock.org/

Unrelated: Anyone have any horror stories for medium to high end IBM sans? Or tricks to make them less horrible?

# ? Apr 15, 2016 01:01

evil_bunnY: Apr 2, 2003

Central IT's v7k were down for 3/4 days a couple months after initial rollout, that was fun (I didn't give a poo poo at the time, now half our mission critical infra is managed by the same people).

I love my employer.

# ? Apr 15, 2016 01:37

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Aquila posted:

https://sadlock.org/

Unrelated: Anyone have any horror stories for medium to high end IBM sans? Or tricks to make them less horrible?

I had nothing but good things to say about V7000, but gently caress SONAS right in the rear end in a top hat. The monitoring endpoints that it exposed were awful, and one of my engineers ended up reverse-engineering the protocol used by their custom monitoring software.

# ? Apr 15, 2016 01:39

Pile Of Garbage: May 28, 2007

Aquila posted:

https://sadlock.org/

Unrelated: Anyone have any horror stories for medium to high end IBM sans? Or tricks to make them less horrible?

I posted ages ago about a V7000 which locked-up in a weird way due to some obscure bug: http://forums.somethingawful.com/showthread.php?threadid=2943669&userid=117560#post404083868.

Also this post was regarding a DS3500: http://forums.somethingawful.com/showthread.php?threadid=2943669&userid=117560#post409789618 (The moral of that story is to configure e-mail alerts god drat it)

I miss working with SVC/V7000 kit

# ? Apr 15, 2016 01:55

Aquila: Jan 24, 2003

cheese-cube posted:

I miss working with SVC/V7000 kit

I guess it's a v9000, I don't come in contact with it at all (I'm a hadoop admin now!). The stories I hear about it make me miss working on Hitachi SANs though.

# ? Apr 15, 2016 02:22

Thanks Ants: May 21, 2004; #essereFerrari

Having a bit of a trial-by-fire at the moment with a new NetApp filer that has arrived before the training. Am I right to assume that the maximum CIFS volume size can only be as large as the largest amount of free space on a single aggregate? So if I have 20TB split across two nodes (controllers in a 2554) the largest CIFS volume I can make is going to be 10TB unless I move all the disks onto one node?

I assume best-practises here is to balance disks between nodes and just create multiple volumes?

# ? Apr 18, 2016 10:38

Mr-Spain: Aug 27, 2003; Bullshit... you can be mine.

We are looking into our first san/array. We will need about 30-40TB to start off with and scale from there. So far I have a couple of quotes, Nimble, Tegile, EMC and Dell/Compellant.

I've got some general pricing back already, most of the offerings seem pretty good, but so far the most bang for the buck have been the Compellant systems. Is there any real reason to stay away from them as a vendor? I can post up specs and pricing. I think alot of it has to do in the space and who they are competing against. The disks in their solutions are mostly 1.8tb 10K offerings. Any thoughts?

Estimated Data would be 10TB video (hardly ever accessed), 10TB call center recordings, which after written would be randomly accessed. The last 10 or so would be VM and fileserving, with a few 16 GB or so SQL databases. Thanks!

# ? Apr 18, 2016 13:34

bigmandan: Sep 11, 2001; lol internet; College Slice

Mr-Spain posted:

We are looking into our first san/array. We will need about 30-40TB to start off with and scale from there. So far I have a couple of quotes, Nimble, Tegile, EMC and Dell/Compellant.

I've got some general pricing back already, most of the offerings seem pretty good, but so far the most bang for the buck have been the Compellant systems. Is there any real reason to stay away from them as a vendor? I can post up specs and pricing. I think alot of it has to do in the space and who they are competing against. The disks in their solutions are mostly 1.8tb 10K offerings. Any thoughts?

Estimated Data would be 10TB video (hardly ever accessed), 10TB call center recordings, which after written would be randomly accessed. The last 10 or so would be VM and fileserving, with a few 16 GB or so SQL databases. Thanks!

We have two SC4020's (~25TB each) we got about a year and a half ago and they have been serving us very well. Our usage is mostly VM storage, mail storage (ISP, so lots of accounts) and various databases (some as small as only a few GBs and some in the 50 GB range). We only had one disk failure so far, but it seems it was due to faulty firmware on the drive. We have 3 tiers setup and overall the performance has been pretty good. Our Dell rep was also very aggressive with getting discounts, but we were also buying servers and switches at the same time, so your mileage may vary there.

# ? Apr 18, 2016 14:32

evil_bunnY: Apr 2, 2003

Mr-Spain posted:

Estimated Data would be 10TB video (hardly ever accessed), 10TB call center recordings, which after written would be randomly accessed. The last 10 or so would be VM and fileserving, with a few 16 GB or so SQL databases. Thanks!

You need to know what kind of loads these generate.

# ? Apr 18, 2016 14:37

Mr-Spain: Aug 27, 2003; Bullshit... you can be mine.

evil_bunnY posted:

You need to know what kind of loads these generate.

The DPack run on my current VMs and servers (including databases) was small in IO, spikes to 3-5k but 99% under 1000. Not a huge load. The videos are recorded to camera, then dropped on the storage. If they are accessed again, they are pulled to a local PC. Not a huge deal. Only one i'm not sure of (and I'm looking it up) is the Zoom call recording.

# ? Apr 18, 2016 15:13

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Thanks Ants posted:

Having a bit of a trial-by-fire at the moment with a new NetApp filer that has arrived before the training. Am I right to assume that the maximum CIFS volume size can only be as large as the largest amount of free space on a single aggregate? So if I have 20TB split across two nodes (controllers in a 2554) the largest CIFS volume I can make is going to be 10TB unless I move all the disks onto one node?

I assume best-practises here is to balance disks between nodes and just create multiple volumes?

you can thin provision it as large as you want, but it can only hold as much data as there is space on the aggregate.

# ? Apr 18, 2016 16:46

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Thanks Ants posted:

I assume best-practises here is to balance disks between nodes and just create multiple volumes?

Active/passive, with one mode owning all data disks, is perfectly viable on smaller platforms.

# ? Apr 18, 2016 17:15

Thanks Ants: May 21, 2004; #essereFerrari

Thanks for the responses, my head-scratching wasn't in vain trying to work out how I could get this thing set up in a way that matched the requirements we sent over. Turns out that the guy who installed the thing basically ignored all the information that our distributor had passed them and put it into this position, so they are going to flatten the thing and start over.

# ? Apr 18, 2016 20:50

Thanks Ants: May 21, 2004; #essereFerrari

For people with Nimble appliances and Veeam:

https://www.veeam.com/blog/integration-nimble-storage-veeam-availability-suite.html

# ? May 2, 2016 00:19

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Thanks Ants posted:

For people with Nimble appliances and Veeam:

https://www.veeam.com/blog/integration-nimble-storage-veeam-availability-suite.html

Just in time for us to put our partnership with Nimble on hold. Timing!

Assuming this functions identically to the NetApp and HP functionality this is actually a killer feature because you can do file level restores of data from within a VM directly from Nimble snapshots (i.e. scheduled directly from the controller and not through VEEAM) using the free version of VEEAM. It's great for customers that want to go use snapshots as backups because otherwise doing file level restore is a bit of a chore.

# ? May 2, 2016 18:18

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Can anyone confirm or deny whether it is supported to transition volumes from a fas in 7 mode to a cdot virtual appliance?

# ? May 3, 2016 03:25

Wicaeed: Feb 8, 2005

Thanks Ants posted:

For people with Nimble appliances and Veeam:

https://www.veeam.com/blog/integration-nimble-storage-veeam-availability-suite.html

Super pumped about this on one hand, but really hoping our backup project budget will fit in with the pricing for Veeam

NippleFloss posted:

Just in time for us to put our partnership with Nimble on hold. Timing!

Assuming this functions identically to the NetApp and HP functionality this is actually a killer feature because you can do file level restores of data from within a VM directly from Nimble snapshots (i.e. scheduled directly from the controller and not through VEEAM) using the free version of VEEAM. It's great for customers that want to go use snapshots as backups because otherwise doing file level restore is a bit of a chore.

Curious to know why you guys are leaving Nimble, is it to do with their recent company performance or a technical reason?

# ? May 3, 2016 05:26

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Wicaeed posted:

Curious to know why you guys are leaving Nimble, is it to do with their recent company performance or a technical reason?

We will still sell and support them, but we aren't leading with them anymore and won't be devoting technical or marketing resources to them. The reasons are basically that the interest in them here has dwindled significantly, they don't have any distinguishing features, they don't seem to have a compelling technical direction, and their company performance has been poor of late, in a market that is super competitive and also shrinking globally. The storage market is packed and we already have too many storage partners, so a block only platform with not standout features was an obvious one to set aside because we've got other offerings that can do what Nimble does and more interesting products to lead with in competitive situations.

There's nothing wrong with it, but it's the Toyota of storage at this point. It'll work just fine, but no one is getting excited about it.

# ? May 3, 2016 06:22

Internet Explorer: Jun 1, 2005

What storage are people getting excited about these days? I'm still happily chugging along with my EqualLogics, but my needs are fairly pedestrian. I like simple storage. Fancy storage has only been a pain for me.

# ? May 3, 2016 06:42

Mr Shiny Pants: Nov 12, 2012

Internet Explorer posted:

What storage are people getting excited about these days? I'm still happily chugging along with my EqualLogics, but my needs are fairly pedestrian. I like simple storage. Fancy storage has only been a pain for me.

I was just wondering this, I'd rather have Toyota that trucks along than a Viper that breaks down every two miles.....

Could just be me though.

# ? May 3, 2016 06:46

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

I work for a VAR, excitement about technology is a big factor in driving sales. Things like Pure or Solidfire or Nutanix where you can wow people with a technology presentation or a whiteboard or have an engineer explain all of the cool and unique stuff that they are doing generate excitement from the technical folks in the room, and those are often the ones making the recommendations or controlling the direction the purchasing conversation takes. And those people go out and evangelize to other people that they know in the industry and suddenly they want to know about whatever the cool technology of the month is and they want to buy it when their next purchase cycle comes around. Their actual needs are often a much lower priority than being sold on cool technology, or features they don't actually need.

The fundamental problem that Nimble has right now is that their pitch is "it's simple and has flash!" which is true of a whole lot of other things out there now and those things might also do file or object protocols or have a much more polished ecosystem or simply seem more interesting from a technical standpoint. We had a customer that picked Tegile over Tintri (probably my favorite storage array out there, and it's radically simple and low touch) BECAUSE it didn't have enough knobs and dials. They wanted something they could tinker with and "tune" rather than something that they could plug in and promptly forget the password to because they'd never need to touch until it was time to retire it. Some people don't want "thing that just works" they want something *interesting*.

If you compare them feature to feature against competing products Nimble is lacking. No deduplication, inline or post-process. No file protocols. Limited scale out. Active/Passive design. No QoS. Read cache only on SSD. Years late to the all flash game. Limited 3rd party integration.

The fact that it works really well at doing the basic job of a storage array, serving up storage in a consistent and performant manner and protecting data integrity, doesn't really show up on a comparison sheet. This is partly because it's hard to market "it actually works" because everyone says that and also because for the most part all of their competitors also work. I think Nimble is, dollar for dollar, more performant than any other hybrid array out there (Tintri is really fast, but starts at a higher price point), but any of them can perform well enough for any customer and pricing can be massaged to make it competitive, generally. So what ultimately wins the sale is the whiz bang factor, or the feature list, or the comfort level of going with an established provider that isn't a startup with a $7/share stock price. Five years ago, when Nimble was founded, they were a unique offering, but they haven't changed much in five years, and the industry has. Hybrid storage isn't disruptive, it's normal. Tens of thousands of IOPs in a small array isn't exceptional, it's normal, because SSD is really cheap.

And, like I said earlier, the enterprise storage market is shrinking. There are fewer deals out there and more competitors and Nimble just isn't well positioned to win a lot of those deals. They NEED to make money to stay afloat, which is something they've never done, and they've got to do it in an incredibly competitive and shrinking market where they don't have the most interesting technology and they have less money than all of their publicly traded competitors. They're in a bad spot, as a company. That's no reflection on the quality of the arrays, which are very solid, but sometimes good products still fail.

YOLOsubmarine fucked around with this message at 19:09 on May 3, 2016

# ? May 3, 2016 07:37

evil_bunnY: Apr 2, 2003

Mr Shiny Pants posted:

I was just wondering this, I'd rather have Toyota that trucks along than a Viper that breaks down every two miles.....

Could just be me though.

Not so many value-adds when you're reselling/implementing basic options.

# ? May 3, 2016 08:44

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Enterprise Storage Megathread: IOPS Isn't the Plural of IOP

# ? May 3, 2016 14:04

Internet Explorer: Jun 1, 2005

NippleFloss, thanks for the detailed response. I am definitely familiar with the "need something to wow the customers" aspect of a VAR.

I'll have to look more into Pure or Solidfire. It's always nice to be aware of the newer offerings. I remember when this discussion was more along the lines of "Pure is cool but they are a small company and probably won't last long." On Nutanix, I had actually looked into them but unless you are starting from scratch or replacing everything at once, I don't see how you can swing it. I spoke with them about it and they didn't seem to have any great plan on solving that problem. "Just use it for part of your environment!" But because of their integrations, that's easier said than done. It would be like having two very different, distinct environments.

When the time comes to look at new storage I'll probably look at Nimble. Simple storage that works well without babysitting fits the bill for the smaller environments I work in these days. If they "fail," I'm sure they'll be bought out by someone, even if just for the support contracts.

# ? May 3, 2016 14:41

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

evil_bunnY posted:

Not so many value-adds when you're reselling/implementing basic options.

Honestly, I'd much rather sell storage that takes a half day to install and that I never hear about again from the customer. The margins on storage are still good enough that we don't need to drive a bunch of services to make money on them, and it's incredibly boring installing yet another storage array. I'd rather work higher up the stack on network virtualization, DR, automation, or something more interesting. We just haven't had much luck selling Nimble when we lead with it, so we're no longer leading with it. The Nimble team here also doesn't help matters much on their end.

Vulture Culture posted:

Enterprise Storage Megathread: IOPS Isn't the Plural of IOP

I'm sorry, I know how this triggers you.

Internet Explorer posted:

NippleFloss, thanks for the detailed response. I am definitely familiar with the "need something to wow the customers" aspect of a VAR.

I'll have to look more into Pure or Solidfire. It's always nice to be aware of the newer offerings. I remember when this discussion was more along the lines of "Pure is cool but they are a small company and probably won't last long." On Nutanix, I had actually looked into them but unless you are starting from scratch or replacing everything at once, I don't see how you can swing it. I spoke with them about it and they didn't seem to have any great plan on solving that problem. "Just use it for part of your environment!" But because of their integrations, that's easier said than done. It would be like having two very different, distinct environments.

When the time comes to look at new storage I'll probably look at Nimble. Simple storage that works well without babysitting fits the bill for the smaller environments I work in these days. If they "fail," I'm sure they'll be bought out by someone, even if just for the support contracts.

Nimble is good. Coming from EQL you will love it. It does what's it's supposed to do in a mostly unobtrusive way.

Nutanix often finds their way in through a one-off project, usually VDI. Customers will have a separate project based budget and for something like VDI a separate environment make sense, so they go with Nutanix to have a self-contained VDI footprint. Then they like it and either expand it or replace the rest of their environment come refresh time. It's generally so low touch on the storage side that having two separate environments isn't hugely problematic. They've also got their own hypervisor offering which helps them stand apart from some of the other players in the hyper-converged and storage spaces.

I'd suggest looking at Tintri as well as Nimble. If your budget allows it then it is very good and as simple as can be. Per VM QoS, snapshots, and replication are great features.

# ? May 3, 2016 18:44

Potato Salad: Oct 23, 2014; nobody cares

NippleFloss posted:

Nimble stuff <snip>

This is a good loving post.

# ? May 3, 2016 18:47

goobernoodles: May 28, 2011; Wayne Leonard Kirby.

Orioles Magician.

NippleFloss posted:

I work for a VAR, excitement about technology is a big factor in driving sales. Things like Pure or Solidfire or Nutanix where you can wow people with a technology presentation or a whiteboard or have an engineer explain all of the cool and unique stuff that they are doing generate excitement from the technical folks in the room, and those are often the ones making the recommendations or controlling the direction the purchasing conversation takes. And those people go out and evangelize to other people that they know in the industry and suddenly they want to know about whatever the cool technology of the month is and they want to buy it when their next purchase cycle comes around. Their actual needs are often a much lower priority than being sold on cool technology, or features they don't actually need.

The fundamental problem that Nimble has right now is that their pitch is "it's simple and has flash!" which is true of a whole lot of other things out there now and those things might also do file or object protocols or have a much more polished ecosystem or simply seem more interesting from a technical standpoint. We had a customer that picked Tegile over Tintri (probably my favorite storage array out there, and it's radically simple and low touch) BECAUSE it didn't have enough knobs and dials. They wanted something they could tinker with and "tune" rather than something that they could plug in and promptly forget the password to because they'd never need to touch until it was time to retire it. Some people don't want "thing that just works" they want something *interesting*.

If you compare them feature to feature against competing products Nimble is lacking. No deduplication, inline or post-process. No file protocols. Limited scale out. Active/Passive design. No QoS. Read cache only on SSD. Years late to the all flash game. Limited 3rd party integration.

The fact that it works really well at doing the basic job of a storage array, serving up storage in a consistent and performant manner and protecting data integrity, doesn't really show up on a comparison sheet. This is partly because it's hard to market "it actually works" because everyone says that and also because for the most part all of their competitors also work. I think Nimble is, dollar for dollar, more performant than any other hybrid array out there (Tintri is really fast, but starts at a higher price point), but any of them can perform well enough for any customer and pricing can be massaged to make it competitive, generally. So what ultimately wins the sale is the whiz bang factor, or the feature list, or the comfort level of going with an established provider that isn't a startup with a $7/share stock price. Five years ago, when Nimble was founded, they were a unique offering, but they haven't changed much in five years, and the industry has. Hybrid storage isn't disruptive, it's normal. Tens of thousands of IOPs in a small array isn't exceptional, it's normal, because SSD is really cheap.

And, like I said earlier, the enterprise storage market is shrinking. There are fewer deals out there and more competitors and Nimble just isn't well positioned to win a lot of those deals. They NEED to make money to stay afloat, which is something they've never done, and they've got to do it in an incredibly competitive and shrinking market where they don't have the most interesting technology and they have less money than all of their publicly traded competitors. They're in a bad spot, as a company. That's no reflection on the quality of the arrays, which are very solid, but sometimes good products still fail.

Great post and I'm in the middle of trying to pick the best route to take for our infrastructure. Made a post two weeks ago at the end of the day in which my brain was fried:

goobernoodles posted:

Anyone have any strong opinions on which is �best� out of these options?

Option 1 (58k) � HP DL360 G9 paired with a HP MSA 2040 connected directly w/ 12Gbps SAS cables. 384Gb, 48 physical cores, 2x 10gb nics and 2x SAS HBA�s per server. 14Tb usable capacity, 2x 400gb ssd�s for read cache. I guess you can pay for a license to get auto data tiering and enables write caching to SSD�s.

Option 2 (66k) - probably more like 70-80k) � 3 node Nutanix solution with 24Tb raw capacity, 384Gb memory, 36 physical cores. They claim 50% usable capacity which puts us at 12Tb usable, which is cutting it too close. This was quoted at $65k and sounded like they were going to drop it further. Sounds like if I wanted to get one for Portland, I might be able to get two for around 100k. Waiting on a quote for the next bundle up w/ 32Tb raw, 8 core procs and more memory and expect it to be about 70-80k hopefully.

Option 3 (64k) � HP DL360�s mentioned above but paired with a Nimble CS235 with 24Tb raw. They claim 24TB usable due to compression.

I'm going to dedicate some time into reading into all of these options (was also quoted a NetApp san for around the cost of a nimble) but as of right now, it seems like both the Nutanix and HP/Nimble options are pretty compelling. Nutanix's pricing is apparently the lowest it's ever been due to their quarter ending at the end of the month, and it's the last quarter before they go public. Both sound great from a support standpoint, although Nutanix has a leg up there considering it's just one vendor. That's a huge plus for me since it's currently just me here and and getting buried alive. If you want to continue reading for more info on the environment feel free;

Commence rambling:

I've been hoping to replace my company's primary servers and storage for years. But due to budget constraints, a poo poo ton of major issues that needed to be ironed out as well as long overdue big projects, it�s been continually pushed back. I�m also completely overloaded and actively trying to find a helpdesk guy to shield me from being interrupted every goddamn 5 minutes. Looks like it's finally going to happen here pretty soon. The company has a little over 200 total employees and about users with 110 workstations. The Seattle office has 3 IBM x3650 servers (original, M2 and M3 versions) in production, paired with an IBM DS3300 iSCSI SAN with 7.1Tb usable capacity. 28 physical cores with 172Gb of memory. ~30VM�s and Veeam backups to a Quantum dedupe appliance (DXi4601). The Portland office has a single, ancient IBM x3550 M2 and veeam backups to another DXI. CPU isn�t a concern. We�re typically at about 25-35% utilization. Memory is 90-95% all day every day. I�m using more storage than I have on our SAN, in production, only because I�ve shifted some low-importance production servers and test machines, etc to FreeNAS storage. I have about 9tb of crap storage I can use in a pinch in each office.
For the sake of background info, we recently got a 1Gbps Comcast layer 2 fiber connection to connect the two offices, along with a 50Mb fiber internet connection in Seattle and a Microwave connection that should be upgraded within the month to be capable of bursting to 1Gbps. I also got a Comcast business coax connection installed recently in Portland for web traffic and as an additional backup connection that I could turn a "RED tunnel" or IPSec tunnel up on if the fiber were to go down. Network hardware consists of Sophos UTM's at each site and HP Procurve switches. Seattle has a 5412R and Portland, a 2920 iirc. Unifi AP�s.

Seattle

� Web app for accounting, cost projections, AP, HR, etc. SQL based, Tomcat front-end. Max 21 concurrent users.
� Another server that runs reports off of said SQL db.
� Exchange 2010 w/ mailbox database that just surpassed 1Tb. That number should drop massively once I have the storage to create new mailbox db�s, to migrate mailboxes to. No quotas were ever implemented and now the database is too big to offline defrag without taking forever.
� Another SQL based application server � max 10 users.
� Handful of other small application servers with 5-10 users each.
� DC�s, DHCP, Print, a couple file servers, WDS/MDT server, WSUS..
� RDS gateway and broker server and a separate session host server. I�m currently piloting this and holding out until we have the new hardware to open it up to all staff. If usage takes off, it could be a resource hog.

Portland is a babby.

They have a couple of dinky application servers, but I fully plan on consolidating those into servers in Seattle as I migrate the Seattle applications to 2012 R2. No real need to keep them segregated when we have a 1Gbps connection to Portland. Also plan on combining the Portland file server with the Seattle one which adds about 1.5Tb to the capacity needs when not taking into account any sort of compression or deduping. Basically, in the long run, there isnt� going to be much in Portland other than supporting servers like� DHCP, a DC, DHCP, DFS copy of the Seattle file server, downstream WDS, WSUS and (maybe) Kaspersky servers and a veeam proxy server if it makes sense.

Basically looking for any sort of different perspectives. The potential for implementing a similar solution (such as two nutanix setups) is there and I need to at least pitch the idea. Didn't really think it was a possibility until I had an informal 30 min chat with the CFO about where I am looking into this poo poo. Haven't really thought too much about what exactly that would get us. DR site, what else? Original plan was to move our existing IBM garbage to Portland after implementing in Seattle.

...Jesus christ. I really need to go home now.

I'm not sure if I caught the CFO on a good day today or what, but I got basically a blank check to pull the trigger on any of the 3 solutions we've been looking at - for both offices. I know this year has been a fairly good year for us and she's got some budget to work with. Sounds like if I keep it under 150k she'll pretty much sign off on the purchase.

As of this morning, I was leaning towards Tegile or Nutanix. Tegile seems to have a leg up on Nimble at least in my mind right now due to far more usable capacity for a bit less money. When you factor in the additional protocol support that I initially didn't really put much weight into, it really looks like a pretty flexible option. That would mean that I could do odd-ball backup jobs like our email archive server, which requires an SMB share, directly to the SAN. I'm thinking that I could a) vastly decrease the RPO by using snapshots for day-to-day backups as decreasing the RTO for those scenarios where we need to revert a VM or recover files. Right now, it's a bit of a chore just due to having to wait 1-5 minutes for Veeam to mount a backup before I can recover a file. Also, I'm hoping that whatever solution we move forward with will let me eliminate the support costs for our Quantum DXi4601's that act as our Veeam Storage targets currently. We have two, with Veeam backup replication jobs taking care of the replication between our two sites. Support is over 8k annually which blows my goddamn mind. I need to confirm they're not going to just turn into bricks or something if out of support, but I figure I can relegate them to being much longer term backups with replication site to site. This project has kind of blown up into one that is now vastly larger than I anticipated quickly now that the CFO is open to opening up the wallet for both sites and now I'm scambling to make sure than I'm not shooting myself in the foot with regards to BCDR with any of these options.

We can do storage level replication with any of the 3, Veeam replication, VMware replication... is there a reason that decision should be made before going any of these? Sounds like there's some relatively minor differences between the options as far as granularity but at least from the storage side of things, they're all effectively pretty similar. I could ramble on incoherently for a while about all of the potential other options for things we could do with each solution, but I simply don't have the time figure out every single possibility and what's the best fit for us. Unless I'm missing something, the introduction of flash has really made finding the "best" solution less about sheer by the book performance numbers since there's no real way to know how a proprietary file system will work for any given workload, no? Going down that rabbit hole thus far has led to a circle-jerk of counter-arguments and usually coming down to "WELL OUR FILE SYSTEM IS BETTER, YOU'LL SEE" :smugbird:

I'm waiting on Nimble to quote a CS300 with around 36Tb of usable capacity, even though that's way more storage than I was aiming for originally. I was thinking 15-20Tb would be a good place to start.

I can put a Nutanix cluster with 96 logical cores, 768Gb RAM, and ~18Tb usable in place in Seattle along side a smaller cluster with 72 cores, 384Gb RAM, ~12TB usable in Portland. The comparable solutions with HP servers and Nimble/Tegile arrays are roughly in the area combined are roughly 10-15k less. When you factor in the potential for increased consulting costs with any of the server/SAN options, it's pretty much a wash as far as cost goes.

The biggest #1 question that I have no real idea how to answer is which one is fundamentally the strongest from a storage perspective. Logically, it seems like the Nutanix approach of trying to localize data to the host of the VM may produce the best performance since most of the read/writes are going to direct attached storage eliminating a lot of "hops." While the main argument other vendors make is that you've got to carve out CPU & memory from the nodes for use by the virtual storage controller, it does give us a great deal of flexibility there to increase RAM/CPU if necessary. The biggest question mark there for me is whether or not a virtual storage controller paired hitting direct attached storage will perform better than the SAN's. That, and Nutanix has SATA drives whereas the Tegile and Nimble are... SAS? Not sure on those - I just shot those questions off to the vendors. It's a poo poo-ton of money to just throw a dart at a wall.

It's hilarious that I'm posting an increasingly similar (incoherent) post to my last one at nearly the same exact time, but I need to leave to go to the space needle or something. Holy gently caress trying to write this entire post while people are sanding a conference room table with a sander made for floors was a bad idea.

# ? May 6, 2016 02:30

Thanks Ants: May 21, 2004; #essereFerrari

Is it normal for a vendor to claim they have a unified appliance with dual controllers that can run active/active for SMB3 and then quote a failover time of between 30 and 60 seconds? That doesn't sound active/active to me.

# ? May 6, 2016 10:02

evil_bunnY: Apr 2, 2003

Thanks Ants posted:

Is it normal for a vendor to claim they have a unified appliance with dual controllers that can run active/active for SMB3 and then quote a failover time of between 30 and 60 seconds? That doesn't sound active/active to me.

Active/active can just mean both controllers can serve data in normal operations, and take over the failed head's load when in FO mode. This does mean you have to size them like an active/passive, but you get extra performance in normal mode.

# ? May 6, 2016 10:12

Thanks Ants: May 21, 2004; #essereFerrari

That makes sense, thanks. I'm confusing how failover works in a file protocol to multipathing on a block protocol. From a bit more reading a sub-minute failover time for file services isn't terrible, and SMB3 is supposed to know about it and keep the connection alive during that time anyway.

# ? May 6, 2016 10:20

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

goobernoodles posted:

The biggest #1 question that I have no real idea how to answer is which one is fundamentally the strongest from a storage perspective. Logically, it seems like the Nutanix approach of trying to localize data to the host of the VM may produce the best performance since most of the read/writes are going to direct attached storage eliminating a lot of "hops." While the main argument other vendors make is that you've got to carve out CPU & memory from the nodes for use by the virtual storage controller, it does give us a great deal of flexibility there to increase RAM/CPU if necessary. The biggest question mark there for me is whether or not a virtual storage controller paired hitting direct attached storage will perform better than the SAN's. That, and Nutanix has SATA drives whereas the Tegile and Nimble are... SAS? Not sure on those - I just shot those questions off to the vendors. It's a poo poo-ton of money to just throw a dart at a wall.

Which is best from a storage perspective depends on what your main criteria are for data storage. They are all better in some areas that others.

Nutanix - No storage management. Once it's up and running (which takes a few hours at most) you aren't managing storage at all, you're simply creating VMs on a datastore. There's no provisioning LUNs or managing VMFS filesystems. Of course, you can still take snapshots and configure replication, but datastore management is non-existent. You also have per VM granularity for those operations, so per VM snapshots and replication, which can be really useful when replicating for DR since you don't have to replicate a whole set of VMs that reside in volume just so you can turn one of them on at the remote site. The downside of Nutanix that we've seen is that it can struggle with monolithic workloads (think large SQL servers, i.e. anything that drives a lot of IO from a single VM) and the cache really needs to be sized correctly to ensure that you have a sufficient amount for hot data. Data locality (having cache on the servers) is nice in theory, but in practice the interconnect is only a small portion of overall latency in an IO, so having the data directly on the node is only noticable for very high IO operations where the difference between a quarter second and a half second is tens of thousands of IOPs. The IO also still has to traverse the drive bus, which is slower than NVMe or direct memory access.

Nimble - Rock solid, very performant, and a good ratio of raw to usable storage before data reductions. If you're comfortable with iSCSI already it's a very good option that's pretty low touch and that you probably won't ever have to worry about too much. It is limited from a features perspective though, so if you want to run block protocols, or you like the per VM management features of the Nutanix it's going to leave you wanting for more. But the basic feature set works very well, and it now has VEEAM support, which is big if you want to use VEEAM. In place controller and cache upgrades are very easy on anything above the CS215. You get more usable cache since it isn't protected, but if you lose a cache drive you lose all of that data and performance can go south very quickly. Double edged sword.

Tegile - Very flexible, lots of features, fairly easy to manage. We've seen some performance issues here as well and it's very sensitive to proper cache sizing, especially the metadata cache. Tends to fall over if you overrun the metadata cache since that's also used to store the inline deduplication table. If your performance requirements are moderate and you want to be able to do block and file from the array it's a good option. It will be more complicated to manage than either of the other two, largely due to the additional features, but also because they hide a lot of the complexity behind professional services and it will occasionally become obvious if you need to engage them for a case. Tegile also doesn't support in place controller upgrades at this time, so if you need to refresh down the road you're looking at a forklift upgrade right now. That may change in the future.

Were it me I'd probably do Nutanix if my workload was mostly distributed across a number of moderately busy VMs, and Nimble if my workload was a few large, heavy hitting VMs. Tegile only if I really liked the multi-protocol feature set. But you'll probably be fine and happy with any of them, because most people are.

# ? May 10, 2016 18:32

Moey: Oct 22, 2010; I LIKE TO MOVE IT

NippleFloss posted:

In place controller and cache upgrades are very easy on anything above the CS215.

I did a a couple CS240 to CS300 controller upgrades, required a full outage. Since the different models use different families of CPU, couldn't just do an easy failover and back.

# ? May 10, 2016 18:40

Adbot: ADBOT LOVES YOU

# ? May 26, 2024 14:01

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Thanks Ants posted:

That makes sense, thanks. I'm confusing how failover works in a file protocol to multipathing on a block protocol. From a bit more reading a sub-minute failover time for file services isn't terrible, and SMB3 is supposed to know about it and keep the connection alive during that time anyway.

Basically anything will survive a sub-minute failover. Even in an FC environment you have a timeout value before a SCSI command will be retried, and a number of retry attempts before a path will be marked dead and the IO will be sent down a different path. So you may have an IO delay of a minute or more depending on the configuration. A controller failure, even on a true active/active array (VMAX, Hitachi VSP, not much else) will cause a path state change that can delay IO for a while. FWIW, when you install VMware tools increases the SCSI timeout in guests up to 180 seconds.

# ? May 10, 2016 18:51

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »