Enterprise Storage Megathread: Why is my NAS a SAN?

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »

qutius: Apr 2, 2003; NO PARTIES

NippleFloss posted:

They're likely using the C series as standard rack mount boxes, and not part of a UCS managed deployment. C series can happily run as standard dumb servers without UCS Manager or fabric interconnects.

Exactly this. We get amazing pricing from Cisco so it makes sense and is much easier/faster to order up a C series box when we need a server. Wouldn't be surprised if the same isn't the case in that situation as well.

# ? Jun 12, 2016 22:30

Adbot: ADBOT LOVES YOU

# ? May 12, 2024 15:51

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

qutius posted:

Exactly this. We get amazing pricing from Cisco so it makes sense and is much easier/faster to order up a C series box when we need a server. Wouldn't be surprised if the same isn't the case in that situation as well.

We are in the same boat. I can get a c-series rackmount for a price that is competitive with HP or dell, unless hp or dell are running a good promo at the time.

# ? Jun 12, 2016 23:18

Mr-Spain: Aug 27, 2003; Bullshit... you can be mine.

goobernoodles posted:

Welp, pulled the trigger on a 3x Cisco UCS C220 M4 servers w/ 2630v3's, 1.2Tb of RAM and a Tegile T3530 along with a 2x servers and a T3100 array for our 2nd office. Cha-ching.

e: Well, that and a HP 5406R for the 2nd office and 10Gb SFP+ modules for both offices. Should be quite the upgrade from our 8 year old x3650's and DS3300.

That's the switch I rolled for my backend, how did you get it configured?

# ? Jun 13, 2016 16:14

goobernoodles: May 28, 2011; Wayne Leonard Kirby.

Orioles Magician.

Docjowles posted:

Why the Cisco C series vs rando rack mount boxes from HP/Dell, out of curiosity? Did they come in with a super competitive price? UCS is cool but the extreme flexibility seems kinda wasted, and like it's just adding complexity/points of failure on a three node deployment.

Not trying to be a nitpicky rear end in a top hat; genuinely interested

Waaaaay better pricing than the HP"s we were looking at. It was something like 5 grand less or more for an apples to apples comparison with 3x hosts and 256Gb of memory. They offered a 3rd party memory option for cheap which could have reduced the price further. Regardless, I was able to bump up to 368Gb of memory per host and still stay lower than HP. (I think)

I started talking to this VAR for network consulting. Since they're also in the server/storage market, I heard them out even though I was pretty far into the conversation with Nimble and Nutanix. It turned out the owner of the company is a client of ours - and he owes us a bit of money. While our CEO definitely gave me a little hint that he wanted me to go with these guys if everything was apples to apples, I was hesitant until they came back with a T3530 with a 3.5Tb all-flash tray. That pretty much took performance out of the equation.

Mr-Spain posted:

That's the switch I rolled for my backend, how did you get it configured?

Are you just talking about the physical configuration? Dual management controllers and PSU's, one 8x SFP+ for servers and storage and 3x 20-port 1Gb port modules with 2x SFP+. Got another 8x SFP+ and a few more of the 20 port, 2x SFP+ modules to put into our 5412R at the main office as well. I wanted some more 1G ports in both offices as well as a few more SFP+ modules than I need right now for flexibility down the road. I think all of that came out to 14k - the switch is HP renew. Forget if the modules and whatnot are new or not, but I guess it doesn't really matter with the whole lifetime warranty thing. There's a company I go through in Seattle that has consistently had the best pricing on HP gear that I can share if anyone is interested.

goobernoodles fucked around with this message at 17:32 on Jun 13, 2016

# ? Jun 13, 2016 17:29

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Anybody running OEM-rebranded E-Series enclosures? Any tips?

# ? Jun 13, 2016 18:03

Mr-Spain: Aug 27, 2003; Bullshit... you can be mine.

goobernoodles posted:

Are you just talking about the physical configuration? Dual management controllers and PSU's, one 8x SFP+ for servers and storage and 3x 20-port 1Gb port modules with 2x SFP+. Got another 8x SFP+ and a few more of the 20 port, 2x SFP+ modules to put into our 5412R at the main office as well. I wanted some more 1G ports in both offices as well as a few more SFP+ modules than I need right now for flexibility down the road. I think all of that came out to 14k - the switch is HP renew. Forget if the modules and whatnot are new or not, but I guess it doesn't really matter with the whole lifetime warranty thing. There's a company I go through in Seattle that has consistently had the best pricing on HP gear that I can share if anyone is interested.

Sure, I'll get plat later an PM you. I ended up getting a seperate 5406 for only the storage network and used dual controllers, power and two of the 8 port 10GB-T cards for 10GB iscsi. Seems to be working pretty good so far.

# ? Jun 14, 2016 13:19

bull3964: Nov 18, 2000; DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.

Really regretting the purchase of our FAS2554 arrays, mainly due to Netapp's Byzantine support system.

If you have a system that auto connects every day and after an incident, why do I have to go on a scavenger hunt running commands (with little context given as to how I run then) for you to investigate the incident?

With Pure, I press a button and the support team can get in and have access to anything they need to make a diagnosis and possibly a fix.

With Netapp, it's 5 hours of back and forth with someone who gives every indication that they are barely more aware of how the thing functions than me.

# ? Jun 21, 2016 18:09

Maneki Neko: Oct 27, 2000

bull3964 posted:

Really regretting the purchase of our FAS2554 arrays, mainly due to Netapp's Byzantine support system.

If you have a system that auto connects every day and after an incident, why do I have to go on a scavenger hunt running commands (with little context given as to how I run then) for you to investigate the incident?

With Pure, I press a button and the support team can get in and have access to anything they need to make a diagnosis and possibly a fix.

With Netapp, it's 5 hours of back and forth with someone who gives every indication that they are barely more aware of how the thing functions than me.

Netapp support used to be good! We moved over to Tegile last year and have been happy support wise.

# ? Jun 21, 2016 19:25

Potato Salad: Oct 23, 2014; nobody cares

We get NetApp through a pretty good local VAR who has always pointed us in the right direction for initiating support.

# ? Jun 21, 2016 20:32

Thanks Ants: May 21, 2004; #essereFerrari

We're making our first forays into NetApp and where I've gone to set something up, read the documentation, followed it and had something unexpected happen I figured I can ask support. Except they are very quick to ask questions to figure out that it's an issue doing something as opposed to something breaking, and suggesting professional services.

I kind of understand where they are coming from, but I'm not asking to be hand-held through something where I couldn't be bothered to learn about the product. Your support isn't cheap, I don't understand why "hey can I run this past you guys" is such a frowned upon request.

# ? Jun 21, 2016 20:57

the spyder: Feb 18, 2011

Does anyone have long-term Isilon maintenance costs for a 6-10 node cluster they would be willing to share (privately)? 4-5 year + ideally, but anything would help.

On a semi-related note, has anyone used third-party warranty companies like Pivot on EMC hardware?

# ? Jun 21, 2016 21:55

Docjowles: Apr 9, 2009

Potato Salad posted:

We get NetApp through a pretty good local VAR who has always pointed us in the right direction for initiating support.

The fact that it's easy to initiate support the *wrong* way is kind of indicative of a problem. I don't mean that as a knock on NetApp in particular, though. Any sufficiently large company runs into that issue. We've had good success with NetApp support but yeah, it came down to having contact info for someone who could bypass the "hurr durr is it plugged in? can you email us 500000 pages of boilerplate diagnostics we already get every day from autosupport?" stage.

Super easy support is a way the smaller vendors can differentiate themselves, for sure.

# ? Jun 22, 2016 01:25

Potato Salad: Oct 23, 2014; nobody cares

Thanks Ants posted:

I don't understand why "hey can I run this past you guys" is such a frowned upon request.

Liability.

# ? Jun 22, 2016 01:51

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Support is the wrong place to ask operation questions because a lot of the time they won't actually know. They aren't trained to run a NetApp they are trained to fix something that is broken, and because NetApp is huge and is a pretty wide array of features and products, they are also often fairly narrowly focused.

The correct place for questions like "how do I do thing X" is your VAR SE or NetApp SE. God help you if those people are morons though. Or if you just don't know who they are because the NetApp sales team dropped the ball.

Smaller companies like Pure or Tegile can handle those sorts of questions through support because their support staff is much smaller and they can afford to hire and train very good people to staff those limited positions. As you get bigger that's less feasible. But even with them those questions *should* go through and SE or Professional Services contact.

Autosupport feels Byzantine because it's old as hell relative to the newer vendor offerings which mostly just looked at how NetApp did things with AutoSupport and improved them, while NetApp is stuck trying to graft new functionality on top of legacy code and infrastructure because rebuilding it from the ground up would be a massively expensive engineering exercise. Which is a good lens through which to view most of NetApp's issues compared to newer vendors. Supporting legacy solutions is expensive in time and money and the longer you've been around the more there is to support.

I still do a lot of NetApp services work, so if you're looking for an answer to a "how to do this" question you can always ask here of PM me. Or just google. One benefit NetApp does have over startup competitors is a huge user base that blogs and posts a lot about their issues and resolutions.

# ? Jun 22, 2016 03:59

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

NippleFloss posted:

I still do a lot of NetApp services work, so if you're looking for an answer to a "how to do this" question you can always ask here of PM me. Or just google. One benefit NetApp does have over startup competitors is a huge user base that blogs and posts a lot about their issues and resolutions.

This is a big deal. It's similar to how easy it is to find the answer to a cisco question on the internet, because the userbase is huge.

# ? Jun 22, 2016 04:12

bull3964: Nov 18, 2000; DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.

My thing wasn't world ending or even that huge, but that's part of what made it so annoying.

Controller reboots at 8:30 this morning and causes a takeover. Autosupport logs it. I get into the office and see the event in my email along with the requisite case.

Now though, the very first thing that happened was "We don't have any current contact info for anyone, can you confirm?" I just had a controller replaced back in September, so I don't know why you don't say you have contact info.

So, the first thing I do is confirm that and I wait. I get acknowledgement of the account info confirmation and then get an email from autosupport saying if there's no input it will close after a certain period of time.

The case also said "awaiting customer update" when nothing thus far asked for anything. So, I put a note on it and asked what the next steps were for the ticket. I hear back from someone asking me to trigger autosupport again and then get the output of various service processor commands.

I go to upload.netapp.com as instructed to upload the outputs (can't batch select files BTW, one at a time) and then i put a note on the case indicating that I had uploaded the repeated info, that currently the array was reporting as healthy (aside from some LIF's not being on their home controller), and that it wasn't causing a current production issue (since they asked). I also asked if there is any action I should be taking.

After some time I get a response back that the watchdog rebooted the controller and that it was a transient problem that I shouldn't worry about, just keep an eye on it if it happens again.

So, nothing dramatic, no downtime, and resolution within a day. So, why was I annoyed? The efficiency of the process. We don't have dedicated storage admins. Even then, all told, this probably took about an hour of of my day (if that) so it wasn't even a huge time sink. But it felt like every step of the way I had to prompt them to action, that if I got pulled away into something else for a few minutes that nothing would end up progressing in the case.

Part of this is because I got slammed with a bunch of items all at once and this was an added annoyance that I didn't plan for in my day and part of this is colored from a god awful 16 hour controller replacement debacle back in September and I dreaded having to potentially do another hardware swap on it. That was a whole other instance where support failed. We had 4 hour support on the thing but it sat for over 8 hours with a dead controller until I called our VAR about it and got the thing moving (Netapp screwed something up with the account initialization so things weren't entering the right priority in the support queue.) The tech that netapp sent out was mostly familiar with IBM branded versions of the arrays and he spent 3 hours on the phone with support to properly do the swap which was complicated by the fact that even the SP on the old controller was fried and they didn't have a procedure for it. After all that, I contacted support about how to verify that the licenses were all transferred over properly only to hear back from them "Oh, I'm sure the tech would have done it." Fast forward to 80 days later and I'm getting warnings from Netapps portal that the new controller ID wasn't linked up with the licenses.

I also noticed this round that somehow they have our datacenter listed as the organization on the contact info which is wrong and wasn't like that before. I'm sure that will cause problems at some point.

Contrast that to Pure. I was noticing some latency spikes on our array and emailed support about it. No complicated portals, just emailed support. Within an hour they requested I enable remote support so they could take a look. They quickly discovered that one SSD in the array was occasionally resetting. They told me about it and said they could disable the SSD until a replacement arrived. I told them to go ahead and they switched off the disk and the array rebuilt parity around the remaining disks in 30 minutes. The whole interaction was about 10 minutes that I did purely by email and in less than an hour we were at full operating capability (with just a little loss in capacity) and I didn't have to do anything.

I understand the scale issues and realize it's a challenge to maintain that as you scale. At the same time, these newer products are just so DEAD SIMPLE to use (and yes, I know it's a bit different since Netapp is multi-protocol, but still.) I need to update the firmware of our Pure array, I contact support and hand it over to them. In less than an hour with zero interruption, it's updated. I looked into what was necessary to update ONTAP and it's like a 40 page planning guide. I just don't have time for that poo poo and there's no reason why these things need to be this hard at this point. So, I'm faced with spending money with our VAR for implementation support and having them do it when it's included with a different hardware vendor.

It's all going to be moot for me in the long run since we'll be converging infrastructure with our parent org sometime next year and hardware will be out of my hands. In some ways, I'm going to miss screwing around with this stuff. However, on days like today, I can't wait to hand it off to someone else. I'm not going to miss the stomach drop when some core bit of infrastructure starts behaving in odd ways and you're never quite sure if the other shoe is going to drop.

# ? Jun 22, 2016 05:03

evil_bunnY: Apr 2, 2003

With all due respect it sounds like you need to have a VAR help you out and you're not willing to accept/pay for it.

# ? Jun 22, 2016 13:11

Thanks Ants: May 21, 2004; #essereFerrari

code:

Record 182: Wed Jun 22 13:06:26 2016 [SP.critical]: Filer Reboots
Record 183: Wed Jun 22 13:06:31 2016 [Boot Loader.warning]: Corrupted or missing Primary AND Backup env files. Use default.
Record 184: Wed Jun 22 13:06:36 2016 [Boot Loader.critical]: Missing kernel primary.

# ? Jun 22, 2016 16:34

Pile Of Garbage: May 28, 2007

The deadliest combination: verbose and terrifying.

# ? Jun 22, 2016 17:19

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Never watch a NetApp boot. If it comes up in the end then it's usually fine. If it doesn't then you know somethings wrong. Watching the flood of dmesg-like output will just give you a coronary event.

# ? Jun 22, 2016 19:42

Thanks Ants: May 21, 2004; #essereFerrari

This one kernel panicked, dumped memory, rebooted, and now can't find its OS. Should be fun.

# ? Jun 22, 2016 19:50

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

Thanks Ants posted:

This one kernel panicked, dumped memory, rebooted, and now can't find its OS. Should be fun.

Probably a failed CF card. It's not really too bad a fix. If it is a failed card you'll just replace it, install the proper ONTAP version onto it, and then it should boot. All of the important data is stores on the disks, the controllers are (mostly) stateless.

# ? Jun 22, 2016 20:09

qutius: Apr 2, 2003; NO PARTIES

NippleFloss posted:

Never watch a NetApp boot. If it comes up in the end then it's usually fine. If it doesn't then you know somethings wrong. Watching the flood of dmesg-like output will just give you a coronary event.

I like watching them boot :ohdear:

# ? Jun 22, 2016 20:15

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

qutius posted:

I like watching them boot

Every time I sit with a customer during an upgrade or any work that requires service processor monitoring I find myself constantly saying "nope, that's fine....that too....yup, it's still fine...no, not any longer than usual...and there's your prompt".

# ? Jun 22, 2016 20:23

Thanks Ants: May 21, 2004; #essereFerrari

NippleFloss posted:

Probably a failed CF card. It's not really too bad a fix. If it is a failed card you'll just replace it, install the proper ONTAP version onto it, and then it should boot. All of the important data is stores on the disks, the controllers are (mostly) stateless.

That's good to hear. I'm not hugely confident that the support that have picked the case up know what to look for - we're 4 hours in without any idea of what happened.

Edit: Seems to be an 8.3.2 problem that is resolved in P2.

Thanks Ants fucked around with this message at 14:14 on Jun 23, 2016

# ? Jun 22, 2016 20:57

stevewm: May 10, 2005

We have been looking at setting up a 3 host Hyper-V cluster.

The VMs to be run on these hosts are 85% read, 15% write. 500 IOPs average (over a 48 hour period) with some spikes to 2200. 1.2TB of total used space, and not growing very fast.

Been looking into a few options and have received several quotes. Have received quotes for a Dell VRTX, 3x Dell servers with a MD3220i, HP servers with HP MSA1040, and then Lenovo servers with a Lenovo S3200 SAN... The Lenovo one caught my eye. On top of being one of the cheaper options, it looks decent on paper. (https://lenovopress.com/tips1299-lenovo-storage-s3200) But hasn't been on the market that long and really can't find much about it.

Anyone seen these things in person? (S220 or S3200 )

Strangely the Dell VRTX was the most expensive of the quotes received thus far. I had expected it to be cheaper....

# ? Jun 29, 2016 16:08

skipdogg: Nov 29, 2004; Resident SRT-4 Expert

Lot's of changes at work the last 6 months or so, long story short storage has fallen on my shoulders.

We have a ton of CIFS shares on our EMC VNX systems that I need to migrate from one domain to another. We have various 3rd party tools that can handle the actual data, but the backend cifs server/datamover stuff I have no idea how to handle. Anyone done this before and can offer some tips? We have support, though support might say it's out of scope, consultant isn't out of the question but don't have $$$ budgeted for it so I'd have to make a good case why I can't figure it out myself.

The company that acquired us is all NetApp so the EMC gear is a bit foreign to them and they don't want to touch it.

# ? Jul 8, 2016 20:00

Rhymenoserous: May 23, 2008

the spyder posted:

Our leadership is going to have a heart attack when they see the cost.

Meh. gently caress 'em. If anyone bitches about how much my infrastructure costs in SAN, network and VM Servers I'll gladly get quotes for bare metal and attached storage, then when they see that poo poo costs almost twice as much I'll spring the power bill and air conditioning costs on them.

# ? Jul 12, 2016 22:29

evil_bunnY: Apr 2, 2003

Oh god why are volume size marketing restrictions a thing in TYOOL2016

# ? Jul 14, 2016 11:25

H110Hawk: Dec 28, 2006

The dark day has arrived, and I am pricing out directly (or nearly directly) attached storage for a project which can only scale vertically but is critical to our infrastructure. We exceeded the capabilities of a single server worth of disk storage back in October and have been half-assing solutions ever since and wondering why they don't work. I also found out that whomever set these up did them as a concatenated jbod "because it was so fast!" Since then we have spent basically one FTE of a senior system administrator hobbling this thing along, including at 3am when it dies. It is considered a P2 incident for one of these metrics servers to be down. This is a system where they insist on 0 or nearly 0 metrics lost when it dies. I have been half involved with this project, but my proposals have all been blocked. The blockers have all left the company. There, that is my rant.

I am trying to propose a full-rear end solution based on half-rear end data to stop us from spending so much time and energy on hobbling the project along.

The problem space: 5 sites. One site is handily twice as large as the balance of the sites and is the focus of my concerns. Depending on the solution it may expand to the other 4 sites but at smaller IOPS values. Each site has N semi-identical servers in a multicast group reading metrics and writing them to locally stored RRD files. These files are also read by a number of monitoring and graphing services making the I/O fairly random, but the scales are tipped towards writes.

Disk: 1.2TB used today, projected to double in one year.
I/O throughput: 60MBytes/s
IOPS: 15,000, expected to at least double. I am targeting a system no smaller than 50,000 iops.
Budget: Nothing specific, I don't know what this stuff costs. It must work, or it will be my head on a pike if I suggest a bank-breaking solution which fails. To date suggesting a SAN for production (non-IS/IT) has been a fireable offense.
Connectivity: You tell me. Back in my day the kids liked FC and iSCSI.

We're doing a bake off, and others want to do Ceph and Gluster as well. I have low expectations for this particular use case. I've called our VAR and they say HDS. We have a meeting on Tuesday to discuss it. I intend to do a demo or try-and-buy of whatever system we're considering. If the manufacturer won't do that, then I have no interest.

I need something where when any one component of this system croaks it is quick and easy to get back up and running. My ideal solution would be two completely standalone systems which are only barely internally redundant, but when they go down we have an easy way to "catch up" the disks to a point in time with a specific known amount of data loss.

What am I forgetting? How dumb of an idea is this?

# ? Jul 14, 2016 19:38

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

H110Hawk posted:

The dark day has arrived, and I am pricing out directly (or nearly directly) attached storage for a project which can only scale vertically but is critical to our infrastructure. We exceeded the capabilities of a single server worth of disk storage back in October and have been half-assing solutions ever since and wondering why they don't work. I also found out that whomever set these up did them as a concatenated jbod "because it was so fast!" Since then we have spent basically one FTE of a senior system administrator hobbling this thing along, including at 3am when it dies. It is considered a P2 incident for one of these metrics servers to be down. This is a system where they insist on 0 or nearly 0 metrics lost when it dies. I have been half involved with this project, but my proposals have all been blocked. The blockers have all left the company. There, that is my rant.

I am trying to propose a full-rear end solution based on half-rear end data to stop us from spending so much time and energy on hobbling the project along.

The problem space: 5 sites. One site is handily twice as large as the balance of the sites and is the focus of my concerns. Depending on the solution it may expand to the other 4 sites but at smaller IOPS values. Each site has N semi-identical servers in a multicast group reading metrics and writing them to locally stored RRD files. These files are also read by a number of monitoring and graphing services making the I/O fairly random, but the scales are tipped towards writes.

Disk: 1.2TB used today, projected to double in one year.
I/O throughput: 60MBytes/s
IOPS: 15,000, expected to at least double. I am targeting a system no smaller than 50,000 iops.
Budget: Nothing specific, I don't know what this stuff costs. It must work, or it will be my head on a pike if I suggest a bank-breaking solution which fails. To date suggesting a SAN for production (non-IS/IT) has been a fireable offense.
Connectivity: You tell me. Back in my day the kids liked FC and iSCSI.

We're doing a bake off, and others want to do Ceph and Gluster as well. I have low expectations for this particular use case. I've called our VAR and they say HDS. We have a meeting on Tuesday to discuss it. I intend to do a demo or try-and-buy of whatever system we're considering. If the manufacturer won't do that, then I have no interest.

I need something where when any one component of this system croaks it is quick and easy to get back up and running. My ideal solution would be two completely standalone systems which are only barely internally redundant, but when they go down we have an easy way to "catch up" the disks to a point in time with a specific known amount of data loss.

What am I forgetting? How dumb of an idea is this?

Your data footprint is tiny and your IO requirements aren't all that high. Buy an all flash array (Nimble, Pure, whatever) and just directly attach it via FC or iSCSI if you really just need to service a single server. It will end up being cheaper and more resilient than attempting to hack together some mirrored solution that provides less redundancy and RPO and probably costs more and breaks more often.

# ? Jul 14, 2016 19:51

Thanks Ants: May 21, 2004; #essereFerrari

Can you expand a bit more on the application that it needs to support? And also confirm you meant 1.2TB and not 1.2PB?

Quoting throughput figures and wanting vertical scale, and then considering object stores has also confused me.

# ? Jul 14, 2016 19:55

evil_bunnY: Apr 2, 2003

NippleFloss posted:

Your data footprint is tiny and your IO requirements aren't all that high. Buy an all flash array (Nimble, Pure, whatever) and just directly attach it via FC or iSCSI if you really just need to service a single server. It will end up being cheaper and more resilient than attempting to hack together some mirrored solution that provides less redundancy and RPO and probably costs more and breaks more often.

This. In the age of all flash you can write pretty much whatever your links can spit out. single digit TBs is literally fuckall.

Thanks Ants posted:

Can you expand a bit more on the application that it needs to support? And also confirm you meant 1.2TB and not 1.2PB?

Quoting throughput figures and wanting vertical scale, and then considering object stores has also confused me.

I'm suspecting few of the people involved in previous efforts recognized their level of unawareness.

# ? Jul 14, 2016 20:35

H110Hawk: Dec 28, 2006

NippleFloss posted:

Your data footprint is tiny and your IO requirements aren't all that high. Buy an all flash array (Nimble, Pure, whatever) and just directly attach it via FC or iSCSI if you really just need to service a single server. It will end up being cheaper and more resilient than attempting to hack together some mirrored solution that provides less redundancy and RPO and probably costs more and breaks more often.

evil_bunnY posted:

This. In the age of all flash you can write pretty much whatever your links can spit out. single digit TBs is literally fuckall.

This is pretty much my thought as well. I know this isn't much data/many IOPS in the day of flash storage. When it comes down to it we refuse to shard it, other software packages seem to be missing core things we rely on with RRD, etc. This leaves us with an ever growing single point of data aggregation. I am hoping that we see movement to a permanent solution.

Say I do FC, what are the kids using for an FC HBA? We are using latest gen Intel Xeon chipsets.

What is the strategy to "rsync" the data when one of the servers goes down unexpectedly for a few hours / needs to be replaced? Literally rsync? (Since I assume I can't do magical block copies from the directly attached flash array.)

Thanks Ants posted:

Can you expand a bit more on the application that it needs to support? And also confirm you meant 1.2TB and not 1.2PB?

Quoting throughput figures and wanting vertical scale, and then considering object stores has also confused me.

Yes T-Tera. Our PB stores are HDFS backed.

quote:

Configured Capacity : 5.09 PB
Configured Capacity : 14.01 PB

evil_bunnY posted:

I'm suspecting few of the people involved in previous efforts recognized their level of unawareness.

You hit the nail on the head. :suicide:

Thanks for your help SAN thread.

# ? Jul 14, 2016 21:10

YOLOsubmarine: Oct 19, 2004; When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

H110Hawk posted:

Say I do FC, what are the kids using for an FC HBA? We are using latest gen Intel Xeon chipsets.

My preference is qlogic.

quote:

What is the strategy to "rsync" the data when one of the servers goes down unexpectedly for a few hours / needs to be replaced? Literally rsync? (Since I assume I can't do magical block copies from the directly attached flash array.)

This is the big appeal of shared storage. If you buy ONE array and share it between all of the hosts you could instantly create a clone of that LUN that consumes zero space and make it available to any other attached host. Or you could buy an array that does NFS (NetApp, probably) and simply create one shared NFS filesystem or multiple exports and just have a new host mount the old hosts exports if it needs to.

If you refuse to do shared storage you could still used array based replication to move all of the data sets around to other arrays with a relatively low RPO/RTO and do it more efficiently than rsync.

# ? Jul 14, 2016 21:45

H110Hawk: Dec 28, 2006

NippleFloss posted:

This is the big appeal of shared storage. If you buy ONE array and share it between all of the hosts you could instantly create a clone of that LUN that consumes zero space and make it available to any other attached host. Or you could buy an array that does NFS (NetApp, probably) and simply create one shared NFS filesystem or multiple exports and just have a new host mount the old hosts exports if it needs to.

If you refuse to do shared storage you could still used array based replication to move all of the data sets around to other arrays with a relatively low RPO/RTO and do it more efficiently than rsync.

It's certainly something to weigh. I've had really good luck with off-lease NetApp in the past (thanks Zerowait!) but for some reason I don't know if I want to deal with shared anything. It's not necessarily rational. We'll see what the VAR has to say next week.

# ? Jul 14, 2016 23:46

Thanks Ants: May 21, 2004; #essereFerrari

Even if you think you will never need to use shared storage, the features you gain from a SAN (mirroring, LUN cloning etc) make it worth having even for a single host. If you use a clustered file system on your SAN LUNs or a NAS then you can have multiple servers as well for a bit of redundancy at the host hardware level.

# ? Jul 15, 2016 00:15

GrandMaster: Aug 15, 2004; laidback

Does anyone here know 3PAR well?
We've just had a bunch of new disks installed, and i figure it's probably not best practice to mix different capacity disks in the same CPG as the performance profiles will be different (450GB & 1.8TB, both 10K). Our R5 CPG is picking all disks as the filter is just type=fc.

My plan is to create another CPG with a disk type filter, modifying the disk filter on the existing CPG and then using AO to tier.

T1 - CPG_R5_450
T2 - CPG_R5_1800

Is this a bad idea? Can I modify the filter on the existing CPG even though VV data will then be on disks that aren't included? I figure a tune/system maintenance job will get data onto the right CPG disks or something?

# ? Jul 15, 2016 01:30

H110Hawk: Dec 28, 2006

Thanks Ants posted:

Even if you think you will never need to use shared storage, the features you gain from a SAN (mirroring, LUN cloning etc) make it worth having even for a single host. If you use a clustered file system on your SAN LUNs or a NAS then you can have multiple servers as well for a bit of redundancy at the host hardware level.

Yeah, I get paying for magic. Do you know what kind of prices we're looking at?

I realized I never expanded on the application: The multicast group is receiving metrics from several thousand servers, hundreds of metrics per second per server. These are stored both on a per-server basis and several aggregated dimensions, all in RRDs. (So 1000 servers in a role emit say 300 metrics/second which are stored both in 3,000 rrds and as 300 rrd aggregates (times some number of dimensions, depending.))

# ? Jul 15, 2016 02:39

Adbot: ADBOT LOVES YOU

# ? May 12, 2024 15:51

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

H110Hawk posted:

Yeah, I get paying for magic. Do you know what kind of prices we're looking at?

I realized I never expanded on the application: The multicast group is receiving metrics from several thousand servers, hundreds of metrics per second per server. These are stored both on a per-server basis and several aggregated dimensions, all in RRDs. (So 1000 servers in a role emit say 300 metrics/second which are stored both in 3,000 rrds and as 300 rrd aggregates (times some number of dimensions, depending.))

I hope you've turned rrdcache on. (The real answer is to switch to a modern tsdb.)

# ? Jul 15, 2016 03:21

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Enterprise Storage Megathread: Why is my NAS a SAN?

«‹›207 »