Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
parid
Mar 18, 2004
June was a bad month for my VMware clusters. We had a series (at least 3) of network outages that prevented many of my hosts from talking to their storage (all NFS). Every time this happens I have to take time to get vCenter back up before we can dig in fixing the rest of the VMs. In on case, this was an extra hour of delay.

I'd love to fix the root of the problem (the network) but its out of my hands with another team. What I do have control over is how we have vCenter implemented. Right now the vcenter/sso server is physical. I had plans to virtualize it very soon. The SQL server for it is already a VM.

Making vCenter a VM is fine for many failure scenarios, but the kind of disruptive network failure were seeing more of would be challenging to deal with. I would like to find a way to make it multi-site. Or at least have some kind of cold standby if the app in the primary datacenter fails.

What is everyone doing to ensure the availability of their vCenter? It looks like VMware is dropping heartbeat soon. I see vCenter supports SQL clustering and microsoft cluster services now. Is this additional headache worth considering?

Adbot
ADBOT LOVES YOU

mAlfunkti0n
May 19, 2004
Fallen Rib

parid posted:

June was a bad month for my VMware clusters. We had a series (at least 3) of network outages that prevented many of my hosts from talking to their storage (all NFS). Every time this happens I have to take time to get vCenter back up before we can dig in fixing the rest of the VMs. In on case, this was an extra hour of delay.

I'd love to fix the root of the problem (the network) but its out of my hands with another team. What I do have control over is how we have vCenter implemented. Right now the vcenter/sso server is physical. I had plans to virtualize it very soon. The SQL server for it is already a VM.

Making vCenter a VM is fine for many failure scenarios, but the kind of disruptive network failure were seeing more of would be challenging to deal with. I would like to find a way to make it multi-site. Or at least have some kind of cold standby if the app in the primary datacenter fails.

What is everyone doing to ensure the availability of their vCenter? It looks like VMware is dropping heartbeat soon. I see vCenter supports SQL clustering and microsoft cluster services now. Is this additional headache worth considering?

Ignore me .. I can't read apparently :(

mAlfunkti0n fucked around with this message at 17:49 on Jul 2, 2014

Pile Of Garbage
May 28, 2007



Welcome back Dilbert As gently caress.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

I have a contractor that is looking to provide hardware/software for an industrial control system. They're picking vmware as the platform, but with standalone free esxi hosts and no support contracts. My suspicion is that they have a support contract on their test lab and they re-create issues encountered there and then pass the results down to their customers. Is there any possible way this is not violating the terms of their vmware contract? They'd be reselling their support contract to their customers essentially and its suspicious as hell to me.

mayodreams
Jul 4, 2003


Hello darkness,
my old friend

parid posted:

June was a bad month for my VMware clusters. We had a series (at least 3) of network outages that prevented many of my hosts from talking to their storage (all NFS). Every time this happens I have to take time to get vCenter back up before we can dig in fixing the rest of the VMs. In on case, this was an extra hour of delay.

I'd love to fix the root of the problem (the network) but its out of my hands with another team. What I do have control over is how we have vCenter implemented. Right now the vcenter/sso server is physical. I had plans to virtualize it very soon. The SQL server for it is already a VM.

Making vCenter a VM is fine for many failure scenarios, but the kind of disruptive network failure were seeing more of would be challenging to deal with. I would like to find a way to make it multi-site. Or at least have some kind of cold standby if the app in the primary datacenter fails.

What is everyone doing to ensure the availability of their vCenter? It looks like VMware is dropping heartbeat soon. I see vCenter supports SQL clustering and microsoft cluster services now. Is this additional headache worth considering?

We just migrated our vCenter from a VM to a physical due to performance issues with our growing environment. Something to consider if you are looking to go VM.

mAlfunkti0n
May 19, 2004
Fallen Rib

mayodreams posted:

We just migrated our vCenter from a VM to a physical due to performance issues with our growing environment. Something to consider if you are looking to go VM.

What kind of issues were you seeing?

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

If you are expecting any kind of resource contention on your cluster then you should be setting up resources pools with cpu/memory/disk shares at a higher level for super-critical stuff like DCs and your vcenter VM. A minimum of two, probably 3. One high priority for that really important stuff, a one with a normal amount of shares for the regular stuff, and a low priority one for dev servers so memory/cpu pressure gets applied there first.

mayodreams
Jul 4, 2003


Hello darkness,
my old friend

mAlfunkti0n posted:

What kind of issues were you seeing?

A lot of lag, slow console response, a lot of load on the host it was residing on. We have multiple datacenters at different locations, and it was becoming difficult to admin them. Moving to a physical with more ram and procs, and with fast local storage, helped out a lot. We are considering moving the DB to a physical too at some point. This is a 2008 R2 server with vCenter installed, not the appliance. We have around 400 or so VM's for reference.

mAlfunkti0n
May 19, 2004
Fallen Rib

mayodreams posted:

A lot of lag, slow console response, a lot of load on the host it was residing on. We have multiple datacenters at different locations, and it was becoming difficult to admin them. Moving to a physical with more ram and procs, and with fast local storage, helped out a lot. We are considering moving the DB to a physical too at some point. This is a 2008 R2 server with vCenter installed, not the appliance. We have around 400 or so VM's for reference.

I understand your reason for moving to a physical, but I don't know why it would cause alarm with someone considering moving to a VM. The same would be the case if your physical server hosting vCenter had resource contention issues caused by an application running on it. We have close to 1000 VMs on 30 hosts and vCenter runs just fine, but it stays in a small cluster which only handles management duties (vcenter, sso, db, vcops, etc).

luminalflux
May 27, 2005



I just applied for the beta. Is there a good guide to setting up a smallish VMware cluster on a single host for testing purposes?
And is there a small form-factor machine I can set up a lab on (mac mini, intel NUC sized) easily?

Dilbert As FUCK
Sep 8, 2007

by Cowcaster
Pillbug

BangersInMyKnickers posted:

I have a contractor that is looking to provide hardware/software for an industrial control system. They're picking vmware as the platform, but with standalone free esxi hosts and no support contracts. My suspicion is that they have a support contract on their test lab and they re-create issues encountered there and then pass the results down to their customers. Is there any possible way this is not violating the terms of their vmware contract? They'd be reselling their support contract to their customers essentially and its suspicious as hell to me.

I don't think there is anything hard stating that you can't do that but it's sketchy as hell and odd.

Is the 500 dollar essentials kit too much to ask for?

luminalflux posted:

I just applied for the beta. Is there a good guide to setting up a smallish VMware cluster on a single host for testing purposes?
And is there a small form-factor machine I can set up a lab on (mac mini, intel NUC sized) easily?


When you get accepted they have a special form with how to's and some stuff to test out.

Dilbert As FUCK fucked around with this message at 19:09 on Jul 2, 2014

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Dilbert As gently caress posted:

I don't think there is anything hard stating that you can't do that but it's sketchy as hell and odd.

Is the 500 dollar essentials kit too much to ask for?

apparently: yes. and I'm not thrilled with the idea of using this guys as a middleman if we encounter an issue that is clearly in vmware's court. I don't care how much "validation" they do in their lab

evol262
Nov 30, 2010
#!/usr/bin/perl

Dilbert As gently caress posted:

I don't think there is anything hard stating that you can't do that but it's sketchy as hell and odd.

IANAL, but this is against the terms of every support agreement I've ever seen, including ours. Their license and support agreement specifically bans 'X'-aaS and transferral of the license, which includes "reselling their support".

If they can reproduce the issue on their own hardware, they can certainly open a support case. But you can bet that they'll get their license terminated if VMware finds out they're doing that.

Moey
Oct 22, 2010

I LIKE TO MOVE IT

mayodreams posted:

A lot of lag, slow console response, a lot of load on the host it was residing on. We have multiple datacenters at different locations, and it was becoming difficult to admin them. Moving to a physical with more ram and procs, and with fast local storage, helped out a lot. We are considering moving the DB to a physical too at some point. This is a 2008 R2 server with vCenter installed, not the appliance. We have around 400 or so VM's for reference.

Eh, I am running about the same sized environment with no issues. Virtualized vCenter and SQL.

Erwin
Feb 17, 2006

mayodreams posted:

A lot of lag, slow console response, a lot of load on the host it was residing on. We have multiple datacenters at different locations, and it was becoming difficult to admin them. Moving to a physical with more ram and procs, and with fast local storage, helped out a lot. We are considering moving the DB to a physical too at some point. This is a 2008 R2 server with vCenter installed, not the appliance. We have around 400 or so VM's for reference.

Are you sure the original vCenter VM hadn't been upgraded a couple of times, say 4.1 to 5 to 5.1 to 5.5, and the new physical one was built from scratch?

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE

mayodreams posted:

A lot of lag, slow console response, a lot of load on the host it was residing on. We have multiple datacenters at different locations, and it was becoming difficult to admin them. Moving to a physical with more ram and procs, and with fast local storage, helped out a lot. We are considering moving the DB to a physical too at some point. This is a 2008 R2 server with vCenter installed, not the appliance. We have around 400 or so VM's for reference.

Your problems sound like they're storage related.

Docjowles
Apr 9, 2009

Nitr0 posted:

Your problems sound like they're storage related.

The Something Awful Forums > Discussion > Serious Hardware / Software Crap > Your problems sound like they're storage related.

Nice, you've condensed half of the posts in SH/SC into one thread.

luminalflux
May 27, 2005



Docjowles posted:

The Something Awful Forums > Discussion > Serious Hardware / Software Crap > Your problems sound like they're storage related.

Nice, you've condensed half of the posts in SH/SC into one thread.

The other half is "blame the network"

mAlfunkti0n
May 19, 2004
Fallen Rib

Nitr0 posted:

Your problems sound like they're storage related.

Not sure if you're just being silly or what.

Either way, he/she stated the host the VM resided on was quite busy, and in those cases slowness can come from many places. Was the server busy waiting on CPU scheduling? Was memory ballooning a problem? More than not though I hear "it must be a network problem!" more so than the storage bit.

luminalflux posted:

The other half is "blame the network"

Yup. I feel so bad for our network team. Even the team I am part of sends some completely stupid things to them. We are a Cisco shop and have CDP enabled, yet the guys frequently send requests over with no information about switch/port info, AND ITS SO EASY TO GET...

Honestly I don't know how they survive sometimes.

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE
Vcenter takes such little amounts of ram and cpu that if you are running into cpu and ram ballooning then you've got way bigger problems.

Storage! Storage! Storage!

mAlfunkti0n
May 19, 2004
Fallen Rib

Nitr0 posted:

Vcenter takes such little amounts of ram and cpu that if you are running into cpu and ram ballooning then you've got way bigger problems.

Storage! Storage! Storage!

How big of an environment do you manage? I've seen CPU usage and RAM usage go plenty high numerous times. Also, "you have way bigger problems" sure, they just might which is one of the things I pointed out in my earlier post.

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE
1000vm, 40+ hosts

mAlfunkti0n
May 19, 2004
Fallen Rib
We're around similar numbers then. I suppose if things are run right (is there such a place?) vCenter won't go nuts from time to time. Sadly, that's not a place I have ever worked at. :(

YOLOsubmarine
Oct 19, 2004

When asked which Pokemon he evolved into, Kamara pauses.

"Motherfucking, what's that big dragon shit? That orange motherfucker. Charizard."

If your VCenter VM is performing badly it's because you don't have enough host or storage resources, not because VCenter runs badly as a VM. Capacity planning is important!

mayodreams
Jul 4, 2003


Hello darkness,
my old friend
I don't know the full history, but I know it was upgraded from 4.1 -> 5.1 -> 5.5.

We only have 6 good hosts (DL380 G8) and 2 older G6's we are looking to replace with G8's. We have a ton of aging hardware and ballooning needs for servers and QA testing, so we are overwhelming our infra faster than we can build it. Storage isn't bad, but we are using a Nexenta based system from Pogo rather than a Netapp/etc. Still 4gb fiber x2 to the hosts though. It was previously iSCSI on gigabit which really REALLY loving ugly.

tldr: company doesn't want to really commit to upgrading ancient hardware and software, which is evidenced in our continued deployment of new desktops with XP and Novell authentication with eDirectory on Netware. I wish I were kidding. :suicide:

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE
lol I went and looked that up and it's basically a whitebox system. I configured one and by default it stuck in a bunch of slow 7200rpm disk.

So depending on how much money your company spent your storage probably sucks.

WHOS GONNA DISAGREE WITH ME NOW, HUH?! THATS RIGHT.

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE
Run a benchmark of that thing I want to see!

Sickening
Jul 16, 2007

Black summer was the best summer.

mayodreams posted:

I don't know the full history, but I know it was upgraded from 4.1 -> 5.1 -> 5.5.

We only have 6 good hosts (DL380 G8) and 2 older G6's we are looking to replace with G8's. We have a ton of aging hardware and ballooning needs for servers and QA testing, so we are overwhelming our infra faster than we can build it. Storage isn't bad, but we are using a Nexenta based system from Pogo rather than a Netapp/etc. Still 4gb fiber x2 to the hosts though. It was previously iSCSI on gigabit which really REALLY loving ugly.

tldr: company doesn't want to really commit to upgrading ancient hardware and software, which is evidenced in our continued deployment of new desktops with XP and Novell authentication with eDirectory on Netware. I wish I were kidding. :suicide:

In case anybody is wondering, one of these is probably the storage being discussed.

http://www.pogostorage.com/products/nexenta_appliances

8 hosts with %mystery% number of vm's is probably not a good sign that Nitr0 is wrong.

mAlfunkti0n
May 19, 2004
Fallen Rib

Nitr0 posted:


WHOS GONNA DISAGREE WITH ME NOW, HUH?! THATS RIGHT.

I'LL ARGUE WITH YOU JUST TO ARGUE! :smuggo:

But anyways, yes, if it's just a bank of 7200's .. I pity him.

mayodreams, tell me you do follow the rightsizing concept .. right? At least get the stuff you can fix out of the way.

mAlfunkti0n fucked around with this message at 13:36 on Jul 3, 2014

mayodreams
Jul 4, 2003


Hello darkness,
my old friend
I think the VM storage pool is 10k drives in raid10, but I know the mass storage is 7200k drives in a raid6. We are screaming for more hosts, but yeah storage is probably a problem for us. We have about 275 Windows (2003-2012 R2) and about 100 linux (RHEL 4/5/6) vms. We try to balance loads across the hosts manually, but haven't done it with any VMware tools.

Sickening
Jul 16, 2007

Black summer was the best summer.

mayodreams posted:

I think the VM storage pool is 10k drives in raid10, but I know the mass storage is 7200k drives in a raid6. We are screaming for more hosts, but yeah storage is probably a problem for us. We have about 275 Windows (2003-2012 R2) and about 100 linux (RHEL 4/5/6) vms. We try to balance loads across the hosts manually, but haven't done it with any VMware tools.

That is a poo poo ton of vm's.

So when you said your storage was good you meant....?

Sickening fucked around with this message at 19:30 on Jul 3, 2014

skipdogg
Nov 29, 2004
Resident SRT-4 Expert

Holy poo poo. We're nowhere near capacity, but in our California location, I'm running 151 VM's across 12 hosts, separated on 2 clusters to a VNX5500 with a poo poo load of disk. A full shelf of SSD, 15K shelves for VM's and NL-SAS for file storage. Somewhere in the neighborhood of 55TB total.

You poor bastard

mAlfunkti0n
May 19, 2004
Fallen Rib

mayodreams posted:

I think the VM storage pool is 10k drives in raid10, but I know the mass storage is 7200k drives in a raid6. We are screaming for more hosts, but yeah storage is probably a problem for us. We have about 275 Windows (2003-2012 R2) and about 100 linux (RHEL 4/5/6) vms. We try to balance loads across the hosts manually, but haven't done it with any VMware tools.

:suicide:

Figuring even just a single CPU per VM ... it still hurts. I'd say turn DRS on but honestly I don't know that it would actually do anything unless you turned it to the most aggressive setting and even at that time the overhead from vMotions probably kills any benefit that would be there.

Nitr0
Aug 17, 2005

IT'S FREE REAL ESTATE
"but I know the mass storage is 7200k drives in a raid6."

Oh man oh man oh man. No wonder VM blames storage whenever someone phones in with vcenter problems.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Raid 6 is fine for bulk storage so long as you know what you're doing. A good storage controller handles the parity calcs and you get more capacity for a bunch of data that will get written and probably never accessed again because nobody cleans up their poo poo ever.

I'd much rather throw some ssd cache in to the array to speed things up if I needs it than run the risk of corrupt data on rebuild and having to restore who knows what out of the backup system.

Dilbert As FUCK
Sep 8, 2007

by Cowcaster
Pillbug
SSD caching is good to help push the problem of low backend IOPS with caching but cache exhaustion, access to cold data, problems are real and can really cause some WTF moments if not calculated properly.

Don't get me wrong I love SSD caching, but I hate it when people go with like VNXe 3300 200GB FAST cache and 8x3TB drives@Raid 6 then wonder why things hit a wall.

It's also a bit of an annoyance when people don't optimize their 2008r2 templates for a virtual environment.

Dilbert As FUCK fucked around with this message at 23:25 on Jul 3, 2014

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Well yeah, but if your load is running sustained instead of bursty and not easily satisfied by SSD then why would you be putting it on SATA instead of SAS in the first place? I'm all for a cost-effective solution but I've also done the math on average rate to an unrecoverable read failure even with SAS drives and onces you're on the high capacity platters you're talking 1 in 10 odds of data corruption on rebuild with a raid 1 or 10 array. I cannot for the life of me understand why so many people are recommending raid 10 over raid 6 when it puts your balls so close to the bandsaw. I would much rather get the double parity with the performance hit because I can accelerate with SSD or more spindles rather than run the risk of data corruption.

BangersInMyKnickers
Nov 3, 2004

I have a thing for courageous dongles

Give NetApp a big sack of money an watch your storage problems disappear: A Life Lesson

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

BangersInMyKnickers posted:

Raid 6 is fine for bulk storage so long as you know what you're doing.
Raid 6 is fine in general. We use only raid6 (well raidz2 and raid-dp) in our environment and generally everything runs quite well. We do follow the "shitload of cache" philosophy though.

Adbot
ADBOT LOVES YOU

Kachunkachunk
Jun 6, 2011
I'm not sure what to expect with bulk storage RAID-6 these days. At the moment I'm dealing with a slow VCAC provisioning issue where some NAS has a RAID-6 comprised of 'X' number of 7200-RPM drives and they're complaining that it's taking an hour to do 8 simultaneous clones. Read latency is averaging 100ms, and I'm starting to think this is probably to be expected at this stage.

Networking is 10Gbit end-to-end, but their NFS activity (re: from the NAS) is not over Jumbo Frames. We've tried single host and 8 separate hosts doing simultaneous clones, and results are similar enough to say it's probably not a networking bottleneck. During these clones, pings remain timely, but transfers still seem slower than they expect or hope. We never once reached anything close to 10Gbit speeds and doing line/speed testing is out of the question for now (great). I think I've seen maybe 2.5Gbit/s, at best, aggregated over several servers.

The destination storage for the writes (clone/provisioning destinations) is over 8Gbit FC, I believe... but even if it was 4Gbit, it's not hitting line rates, and those are proper 10K+ SAS disks for the destination. And with proper multipathing. And there's a ton of cache, so usually writes end up being quicker than reads (unless you fill it up completely with clones, but latency was typically quite decent for writes).

Any thoughts?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply