|
Corvettefisher posted:Just throwing this out there but, HP because everything else is HP and that's what our VAR recommended we get.
|
# ? Jun 8, 2013 10:29 |
|
|
# ? May 28, 2024 16:22 |
|
NetApp for our applications + most of our VMware. We love all the snap management products and the performance is great. It is however very expensive compared to... Oracle/Sun for our VDI and dead data that we have to retain for some period. We love the price and the performance. Management is pretty simple, though it does not have the cool software that NetApp does.
|
# ? Jun 8, 2013 13:52 |
|
Corvettefisher posted:Just throwing this out there but, HP Storevirtual. Mainly because you can do VMware certified metro clustering with it insanely cheaply.
|
# ? Jun 8, 2013 14:57 |
|
My research group wants a NAS or file server for about 12T of existing data, ideally extensible up to ~20T, on the cheap. We already have a Synology Diskstation with some 'prosumer' level drives for local storage (4x2T drives in RAID 5), but we also have been using a university hosted storage/backup solution and we're finding it too expensive, so we want to take the 12T we have there and move it to our own file server. What can we do that's a little more robust than just buying a bigger NAS, but won't drain our grant budget forever? Keep in mind this group used to store its old data in a literal cardboard box of loose hard drives, so high storage costs make the boss think, "what the hell, I could just get a bunch of drives from Best Buy!" On the other hand, we have an opportunity to get some funding for lab equipment in the near future, so we'd probably consider options up to maybe $15k, though the pressure will be on to keep it under half that (which I think is the level at which you'd be worried about trusting your porn collection to that hardware, never mind years of critical research data you are in theory obligated to provide upon request). I believe the funding is for capital equipment only, so Amazon S3 et al. are out. Ideally whatever we do can also more or less run itself, since we have no dedicated IT staff and the local 'computer people' are both leaving within 6 months.
|
# ? Jun 9, 2013 04:04 |
|
Pretty sure the quotes I had for lower end NetApp were around that figure about 8 months ago, before any haggling on price. Edit: The 15k figure, not the half that option. As mentioned above, there's nothing technically wrong with Synology units, but they don't offer any sort of SLA and I'm not sure I'd want critical data on a box that can also be an iTunes server. Thanks Ants fucked around with this message at 04:25 on Jun 9, 2013 |
# ? Jun 9, 2013 04:22 |
|
Aradekasta posted:we'd probably consider options up to maybe $15k, though the pressure will be on to keep it under half that (which I think is the level at which you'd be worried about trusting your porn collection to that hardware, never mind years of critical research data you are in theory obligated to provide upon request).
|
# ? Jun 9, 2013 05:11 |
|
Dell is pretty much giving away MD's now, might be worth checking into.
|
# ? Jun 9, 2013 05:29 |
|
What is it with faculty and storage, it's like they're all morons about it.
|
# ? Jun 9, 2013 17:15 |
|
I think they believe they are able to make the best choice for themselves for anything. The information they have is how much it costs them to go to Costco and but an external 2tb drive. anything bigger should be perportionally more expensive right? Since they have a superior ability to critically think, why listen to anyone else? Don't understand the decision? Who cares, take your ball and go home. Its not like working together for a common good has ever helped anyone.
|
# ? Jun 9, 2013 19:57 |
|
Thanks for the ideas, guys. In this case the cost of the storage we're using through the school really is excessive as a proportion of our lab's operating costs, which is a bad thing given how squeezed science budgets are right now. It's just that the prof has no sense of what's available between the drive you get from Best Buy and the $3k/T/year managed solution. There are also cash flow issues due to the grant system. With the sequester, some grants have been funded but not disbursed for months after the original target date. Purchases of large shared equipment usually go through a different funding mechanism than normal lab expenses. The combined incentive is for everyone to buy their own cheap-rear end hardware rather than sharing or using a third-party service. adorai posted:if you are replacing a prosumer unit, I would probably just roll my own When we bought the diskstation I thought 'next time I'm just going to build one myself', but now that it's next time, I'm reconsidering. I'm leaving in the next 3-6 months and my 'replacement' on the research side just met Linux a month ago. It's possible the funding won't even come through till after I'm gone, and he's definitely not ready to get a pile of parts on his desk and turn them into a functioning file server. Even if it comes earlier, I don't want to spend valuable
|
# ? Jun 9, 2013 22:08 |
|
Aradekasta posted:$3k/T/year managed solution
|
# ? Jun 10, 2013 03:51 |
|
I know. It's their top-tier storage - snapshots, offsite tape backups, etc. That's the price for our allocation on an Isilon system. Their lower-tier storage is much cheaper - $700/T/year, IIRC - but that's not backed up, and also not accessible from the compute nodes of the cluster, which is annoying. Currently we're split between these two, in addition to our local NAS. The pricing is a little bit silly, because it's all ultimately just the university billing itself, so above-market rates for things like storage derive from the equally fictitious university-defined price for the physical data center space. This place is a total bureaucratic black hole and everything might as well be Monopoly money until, with luck and possibly a ritual sacrifice, it emerges from the death grip of some cranky accountant to actually pay a vendor. In other words, if faculty are morons about storage, it's because everything in their environment has trained them to be.
|
# ? Jun 10, 2013 05:34 |
|
God, at my University just gives space away on its Isilon. My department also provides it's own storage, but because of grant fuckery the only way we can sell it to faculty is to have them buy individual drives for our Compellent boxes, and then the department as a whole just eats the cost of enclosures and parity disks.
|
# ? Jun 10, 2013 05:50 |
|
FISHMANPET posted:but because of grant fuckery the only way we can sell it to faculty is to have them buy individual drives for our Compellent boxes This is so dumb I can't even form the words to describe it. Only some sort of hyper-dimensional chart could do it.
|
# ? Jun 10, 2013 05:57 |
|
Welcome to higher ed, where everything's made up and the points don't matter.
|
# ? Jun 10, 2013 06:11 |
|
NippleFloss posted:This is so dumb I can't even form the words to describe it. Only some sort of hyper-dimensional chart could do it.
|
# ? Jun 10, 2013 06:41 |
|
I think it might actually have to do with how the university itself is setup, it's really hard for one account to bill another account for something, we have to setup special accounts to do it and they require special justification from the CFO of the university or something and also we're all pants on head retarded. Thankfully that's levels above me so I just work within the constraints I'm given and call it a day.
|
# ? Jun 10, 2013 06:54 |
|
FISHMANPET posted:grant fuckery FISHMANPET posted:we're all pants on head retarded. Sounds about right.
|
# ? Jun 10, 2013 07:00 |
|
Yup, that's higher ed. They are specifically structured do that each school/department has total freedom. The system is set up to punish people for doing the right thing. Most of the it money is split up and divided out to each school who gets to make their own decisions with it. Without charging its self, there would be no way to fund large central projects. Your central rates are very reasonable. Have you sat down with your contact there and explained your budget issue? They might be able to cut you a break to "help you do the right thing"?
|
# ? Jun 10, 2013 14:45 |
|
parid posted:Your central rates are very reasonable.
|
# ? Jun 10, 2013 14:57 |
|
Misogynist posted:It gives you some redundancy from all of the most common physical damages to a datacenter -- flood, fire, electrical --
|
# ? Jun 10, 2013 15:29 |
|
Misogynist posted:Try explaining this to the 25 different research groups that are already using Dropbox for Business while IT has absolutely no clue. That is my job 8-5. Its not easy. There are still people willing to look through that and try to work together. That's where I put most of my time.
|
# ? Jun 10, 2013 15:35 |
|
Just as an addendum to my previous post about synchronous replication. I setup a test environment on the same switch in the same room with two Equallogic PS4000s and ended up with the following results from Iometer:code:
|
# ? Jun 11, 2013 17:13 |
|
You have to wait for both devices to acknowledge so of course it'll kill your writes. Increasing queue depth might help, but it's all synthetic anyway.
|
# ? Jun 11, 2013 18:18 |
|
Does your remote target perform better when you random-write to it directly?
|
# ? Jun 11, 2013 18:29 |
|
Misogynist posted:Does your remote target perform better when you random-write to it directly?
|
# ? Jun 11, 2013 19:23 |
|
evil_bunnY posted:That's a good question to ask. Also why the max latency was a full second. Boss man saw the results over my shoulder so I will never know I am moving to rep sets as of 20 minutes ago. I might just delay a bit just so I can test some things out since it is not quite in production yet.
|
# ? Jun 11, 2013 19:41 |
|
demonachizer posted:Boss man saw the results over my shoulder so I will never know I am moving to rep sets as of 20 minutes ago. I might just delay a bit just so I can test some things out since it is not quite in production yet. What size blocks are you using for testing?
|
# ? Jun 11, 2013 23:25 |
|
NippleFloss posted:What size blocks are you using for testing? 32KB Got it all figured out. Both SANs are behaving the same separately but getting the same horrible results when doing the sync rep. EDIT2: Just got word that it is possibly due to the fact that we are using a Cisco 2248TP which is a fabric extender and so all traffic goes to the nexus and back to the FEX module instead of staying in place. It sounds like the recommendations we got for network devices may have been a little iffy. Demonachizer fucked around with this message at 20:36 on Jun 12, 2013 |
# ? Jun 12, 2013 15:26 |
|
demonachizer posted:EDIT2: Just got word that it is possibly due to the fact that we are using a Cisco 2248TP which is a fabric extender and so all traffic goes to the nexus and back to the FEX module instead of staying in place.
|
# ? Jun 12, 2013 20:58 |
|
evil_bunnY posted:This is true, but unless your nexus pegs its cpu during the random test it's not the root cause. And it doing fine with seq r/w points that way too. Yeah I found a white paper on that specific FEX module and best practices for integration with the PS6000 (ours is a PS4000 but I think a lot of the considerations should be the same) that I am going to have the network guy look at. He is pretty drat knowledgeable especially with Cisco technology so I think that we might come up with something. We are comfortable switching away from SyncRep (the equallogic nomenclature for their synchronous replication) as it wasn't part of our initial design spec anyway. It just seemed like we might have everything in place for it so why the heck not.
|
# ? Jun 12, 2013 21:09 |
|
LIke evil_bunnY said, I'm not really sure how it could end up being a switching issue given that it is only effecting random workloads which are going to be lower bandwidth and stress the switches much less. If your switching infrastructure was introducing serious delays I'd expect to see that effect higher bandwidth sequential write loads just as much as the random ones. I'd expect to see some serious packet loss or congestion pushing TCP windows way way down if network was the cause, and that should be immediately apparent looking at the port stats on the controllers and switches.
|
# ? Jun 12, 2013 21:15 |
|
NippleFloss posted:LIke evil_bunnY said, I'm not really sure how it could end up being a switching issue given that it is only effecting random workloads which are going to be lower bandwidth and stress the switches much less. If your switching infrastructure was introducing serious delays I'd expect to see that effect higher bandwidth sequential write loads just as much as the random ones. I'd expect to see some serious packet loss or congestion pushing TCP windows way way down if network was the cause, and that should be immediately apparent looking at the port stats on the controllers and switches. Any idea what we should be looking at then? When I do random writes to each SAN individually everything seems fine.
|
# ? Jun 12, 2013 21:40 |
|
demonachizer posted:Any idea what we should be looking at then? When I do random writes to each SAN individually everything seems fine. Without knowing how SyncRep works under the covers it's hard to say what could be causing it, but it sounds like a protocol problem rather than a configuration problem. Have you looked at the port stats on the controllers (not sure what is available on EQL, but stuff like errors, re-transmits, send-q, window sizes and pause frames would be what I would look) and switches to see if things look good on the network side? Your network admin could do a packet trace from the switch and help you trace how long it is taking for your SyncAlternate controller to ack an IO. I know we have some EQL users in this thread, so maybe they can shed some more light on how SyncRep actually handles writing to the secondary. YOLOsubmarine fucked around with this message at 02:07 on Jun 13, 2013 |
# ? Jun 13, 2013 01:46 |
|
NippleFloss posted:Without knowing how SyncRep works under the covers it's hard to say what could be causing it, but it sounds like a protocol problem rather than a configuration problem. Have you looked at the port stats on the controllers (not sure what is available on EQL, but stuff like errors, re-transmits, send-q, window sizes and pause frames would be what I would look) and switches to see if things look good on the network side? Your network admin could do a packet trace from the switch and help you trace how long it is taking for your SyncAlternate controller to ack an IO. Talked to a dude on Dell forums and he suggested investigating QoS on the switch side to make sure jumbo frames are lossless since that could cause the behavior as well. I will meet with the network dude in a couple days and work him hard. I guess poo poo like this is why my job shines in some ways. It is very low pressure. In another environment I would be drinking myself to sleep by now but here I can kind of just figure the poo poo out then make it live when I am ready. I am also looking at logging methods etc. for the SANs in parallel to make sure all bases are covered and whether I can glean anything from it.
|
# ? Jun 13, 2013 05:01 |
|
The plan's changed a bit, since the new funding won't come through till next year. We need the cheapest NAS possible that will hold onto ~15T of data for six months without crapping out. I'm not in love with the Synology software, but the diskstation we have works well enough. Any obvious reason not to just pick up a bigger one?parid posted:Your central rates are very reasonable. Have you sat down with your contact there and explained your budget issue? They might be able to cut you a break to "help you do the right thing"?
|
# ? Jun 13, 2013 06:20 |
|
demonachizer posted:Talked to a dude on Dell forums and he suggested investigating QoS on the switch side to make sure jumbo frames are lossless since that could cause the behavior as well. I will meet with the network dude in a couple days and work him hard. I guess poo poo like this is why my job shines in some ways. It is very low pressure. In another environment I would be drinking myself to sleep by now but here I can kind of just figure the poo poo out then make it live when I am ready. Have you attempted to run a separate benchmark against another LUN on the primary or secondary while also running it against a SyncRep LUN? If you get controller wide performance problems whenever SyncRep is turned on that should pretty much rule out network as an issue (at least assuming you have dedicated links for replication traffic and client traffic).
|
# ? Jun 13, 2013 06:23 |
|
NippleFloss posted:Have you attempted to run a separate benchmark against another LUN on the primary or secondary while also running it against a SyncRep LUN? If you get controller wide performance problems whenever SyncRep is turned on that should pretty much rule out network as an issue (at least assuming you have dedicated links for replication traffic and client traffic). I just ran a test that was at the same time: Worker 1 -> 100% random to SAN A SyncRep Enabled Volume 5-10MB/S Worker 2 -> 100% random to SAN A Non SyncRep Volume 50-60MB/S Worker 3 -> 100% random to SAN B Non SyncRep Volume 50-60MB/S I think that a decrease in performance is probably expected when writing to multiple volumes at the same time so I am not sure that the 50% decrease on Worker 2 and 3 is a problem. I am not sure that I am able to isolate replication traffic from other SAN traffic on the Equallogic PS4000. I can't think of a way to tag the traffic on the switch level but I am not a network guy. I will ask him if we can define routes based on source and target IP addresses for testing.
|
# ? Jun 13, 2013 15:41 |
|
You need to test on different disk groups.
|
# ? Jun 13, 2013 16:12 |
|
|
# ? May 28, 2024 16:22 |
|
evil_bunnY posted:You need to test on different disk groups. Not sure I understand what that means in terms of the PS4000. I can set up volumes but I am not sure I can set specific disks to service the volume. The organization is by group which is a collection of SANs and then you define storage pools within the group and then you define volumes in the storage pools. Do you mean that I need a new group? I am only able to do Synchronous Replication within a group. I have the pools defined as the entirety of each SAN one is Primary one is Rep. Then within those pools I have two test volumes in the primary pool. One is set to replicate one is not. On the Rep pool I have two volumes. One is a standalone non replicated volume and the other is the replication target volume which is created automatically. Demonachizer fucked around with this message at 17:57 on Jun 13, 2013 |
# ? Jun 13, 2013 17:54 |