Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

madsushi: Apr 19, 2009; Baller.
#essereFerrari

So VMware just announced they're never supporting AHCI for vSAN, which translates to "home lab users get hosed". I was really hoping they'd allow an override or something to let it work, since it's only disabled on their side (worked in beta, etc).

# ? Aug 17, 2014 08:22

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 16:14

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

madsushi posted:

So VMware just announced they're never supporting AHCI for vSAN, which translates to "home lab users get hosed". I was really hoping they'd allow an override or something to let it work, since it's only disabled on their side (worked in beta, etc).

Buy a cheap sas controller on eBay.

# ? Aug 17, 2014 15:22

evol262: Nov 30, 2010; #!/usr/bin/perl

madsushi posted:

So VMware just announced they're never supporting AHCI for vSAN, which translates to "home lab users get hosed". I was really hoping they'd allow an override or something to let it work, since it's only disabled on their side (worked in beta, etc).

What percentage of home lab users do you think have a valid vSAN license but are limited by the <$200 cost of a SAS HBA?

# ? Aug 17, 2014 17:28

some kinda jackal: Feb 25, 2003; �
�

What are the technical requirements for the SAS card anyway? I bought a Perc 6i for like $50 off eBay.

# ? Aug 17, 2014 17:44

evol262: Nov 30, 2010; #!/usr/bin/perl

Martytoof posted:

What are the technical requirements for the SAS card anyway? I bought a Perc 6i for like $50 off eBay.

My guess is "knows how to interpret SCSI commands", but it's a good question.

# ? Aug 17, 2014 18:45

mattisacomputer: Jul 13, 2007; Philadelphia Sports: Classy and Sophisticated.

For actual production use, vSAN is definitely demanding. I think it was in this thread but just a few weeks/months ago where someone had a catastrophic vSAN failure where a SAS controller couldn't keep up with a basic amount of vSAN i/o, even though the SAS controller was specifically on the vSAN HCL anyway. Home lab though? Yeah probably any ol' 6gbps SAS controller will do.

# ? Aug 17, 2014 20:27

Hadlock: Nov 9, 2004

Not sure where to post this, but I would like to buy/rent a server in the $15-20 range to run, probably CentOS 7, to do email and perhaps some other things. Cheapest I have seen is a $9/mo single core machine from NFOServers who I've purchased services from before. This was powerful enough to run a 100 slot mumble server but I'm curious if there's better options a year later.

# ? Aug 18, 2014 07:01

Geisladisk: Sep 15, 2007

Stupid newbie idiot question: I'm running a Windows 7 VM on VMWare workstation. This machine has one CPU core. Can I add more cores to this computer without Windows freaking the gently caress out?

Stupid newbie idiot question the second: The VMWare settings window for the processors has the options "Number of processors" and "Number of cores per processor". I assume that the first is the number of actual physical cores dedicated to the VM, and the second is the number of virtual cores each physical core will be split into for the VM?

# ? Aug 18, 2014 12:45

Erwin: Feb 17, 2006

1) Yes, it'll be fine.
2) The VM will get vCPUs equal to whatever you select for processors*cores. There's no difference if you select 2*1 or 1*2. The only purpose of that setting is for software in the guest VM that cares about sockets vs. cores (e.g. for licensing), or for wide VMs on big physical hosts where NUMA comes into play.

# ? Aug 18, 2014 13:36

Geisladisk: Sep 15, 2007

Erwin posted:

2) The VM will get vCPUs equal to whatever you select for processors*cores. There's no difference if you select 2*1 or 1*2. The only purpose of that setting is for software in the guest VM that cares about sockets vs. cores (e.g. for licensing), or for wide VMs on big physical hosts where NUMA comes into play.

Are there any performance implications? Does the VM get access to the same amount of real-world CPU juice with 1*4, 4*1, and 2*2, for instance?

# ? Aug 18, 2014 13:42

Erwin: Feb 17, 2006

Correct, there's no performance impact. You can read more here. Basically those settings are more important in VMs more complicated than a 2-vCPU Windows 7 VM. For that case, there's no difference.

# ? Aug 18, 2014 14:40

Geisladisk: Sep 15, 2007

Sweet, thanks a bunch.

# ? Aug 18, 2014 14:47

Geisladisk: Sep 15, 2007

So, in relation to my previous post - I'm having serious performance issues on my Windows 7 VM. I tried to fix it by adding more cores (see above). That didn't work.

This is what process manager looks like a couple of minutes after startup. I started the computer, and did nothing except open process manager:

This is a minute or so of doing absolutely nothing. Right clicking on my desktop would bring CPU on all four cores to 70% for a second or two.

There's nothing out of the ordinary running. The other VMs on the host (10 of them) are running smoothly and are responsive while this is going on. The host machine (Ubuntu 14) is responsive while this is going on.

Any suggestions? :confused:

Geisladisk fucked around with this message at 15:18 on Aug 18, 2014

# ? Aug 18, 2014 15:11

Docjowles: Apr 9, 2009

Hadlock posted:

Not sure where to post this, but I would like to buy/rent a server in the $15-20 range to run, probably CentOS 7, to do email and perhaps some other things. Cheapest I have seen is a $9/mo single core machine from NFOServers who I've purchased services from before. This was powerful enough to run a 100 slot mumble server but I'm curious if there's better options a year later.

Check out Linode and Digital Ocean.

# ? Aug 18, 2014 16:40

Dr. Arbitrary: Mar 15, 2006; Bleak Gremlin

Docjowles posted:

Check out Linode and Digital Ocean.

I can vouch for Digital Ocean being convenient and simple. Make sure you look for a promotional code online, you can get a a few months free.

# ? Aug 18, 2014 17:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Hadlock posted:

Not sure where to post this, but I would like to buy/rent a server in the $15-20 range to run, probably CentOS 7, to do email and perhaps some other things. Cheapest I have seen is a $9/mo single core machine from NFOServers who I've purchased services from before. This was powerful enough to run a 100 slot mumble server but I'm curious if there's better options a year later.

Seconding the earlier recommendations for Linode and DO, but also check out the hosting megathread, which can generally give you better answers on specific VPS products than we can.

# ? Aug 18, 2014 17:28

DevNull: Apr 4, 2007; And sometimes is seen a strange spot in the sky
A human being that was given to fly

parid posted:

I don't know if you can call them has "fully supported". I tried to argue this very point on a case 4 months ago. We had issues with View clients not able to connect to their pools. Support refuse to continue troubleshooting as the vCenter for the environment had an e1000 nic. The tech tried to claim since best practice was vmxnet3 he shouldn't have to trouble shoot anything but that and clearly that was our problem.I literally had to go through a week+ of change management busy work before they would even spend more time on the issue. Of course it wasn't even related to the problem and just ended up being a waste of time.

Support in the sense that someone has committed to make it work by a policy, in practice... I wouldn't waste your time with the tech support lines.

This makes me sad to read. Yes, the E1000 is fully supported. It is the default for a lot of VMs. I am pretty sure a lot of people use it internally simply because it is the most stable and pretty much works everywhere.

Does VMware have completely different tech support for View and vSphere? It seems weird that a vSphere tech support would claim the E1000 is not supported.

# ? Aug 18, 2014 17:52

evol262: Nov 30, 2010; #!/usr/bin/perl

bull3964 posted:

They have a few more bugs listed with the same symptoms and they aren't listed as resolved in any version.

The issue isn't whether there may be bugs with E1000 in some circumstances which haven't been fixed yet. There certainly are.

It's that "here's a workaround for this bug we know about until it gets fixed" isn't the same thing as "we don't support this and you should use e1000 as little as possible because it causes nothing but problems".

There are a lot of things which cause development teams and support problems and we'd rather not support them (iSCSI booting, EFI from some vendors, anything to do with Dell's SMBIOS), but we still have to. If a support rep is telling you anything other than "we've opened a ticket with engineering to look at this issue (in a configuration which is 100% supported), in the meantime...", you should be escalating, not saying "well, if it causes VMware problems... :shrug:

# ? Aug 18, 2014 18:49

Nitr0: Aug 17, 2005; IT'S FREE REAL ESTATE

Geisladisk posted:

So, in relation to my previous post - I'm having serious performance issues on my Windows 7 VM. I tried to fix it by adding more cores (see above). That didn't work.

This is what process manager looks like a couple of minutes after startup. I started the computer, and did nothing except open process manager:

This is a minute or so of doing absolutely nothing. Right clicking on my desktop would bring CPU on all four cores to 70% for a second or two.

There's nothing out of the ordinary running. The other VMs on the host (10 of them) are running smoothly and are responsive while this is going on. The host machine (Ubuntu 14) is responsive while this is going on.

Any suggestions?

move to one core and see if it helps

# ? Aug 18, 2014 19:25

Nitr0: Aug 17, 2005; IT'S FREE REAL ESTATE

evol262 posted:

"This is a known issue affecting..."
"This issue is resolved in:"

This is exactly how bug errata is worded. It's a bug, and it got fixed, probably because somebody reported issues.

Your point is what?

My point is that we ran into the same purple screen issues on multiple clusters on the latest 55u1 with all patches and moving totally off e1000 fixed it. It's not fixed.

# ? Aug 18, 2014 19:27

evol262: Nov 30, 2010; #!/usr/bin/perl

Nitr0 posted:

My point is that we ran into the same purple screen issues on multiple clusters on the latest 55u1 with all patches and moving totally off e1000 fixed it. It's not fixed.

This is exactly the point of filing bugs.

# ? Aug 18, 2014 19:49

Nitr0: Aug 17, 2005; IT'S FREE REAL ESTATE

or you can just use the vnic that doesn't crash hosts.. hmm.

# ? Aug 18, 2014 20:00

evol262: Nov 30, 2010; #!/usr/bin/perl

Nitr0 posted:

or you can just use the vnic that doesn't crash hosts.. hmm.

Thit misses the point entirely.

elevator=deadline is a workaround to clock drift on virtualized Linux guests. tickless kernel is the solution
disabling selinux because it breaks your service is a workaround. Learning how to use SELinux is the solution
not using e1000 because there are bugs is a workaround. Reporting and fixing those bugs is the solution

It's fine not to use e1000 because you don't like it or you think it's too unstable and you can't be hosed filing bugs. VMware support telling people that a supported vNIC shouldn't be used isn't a solution. Either drop support or fix the bugs, and escalate to engineering so they know it happens, and get some idea of the frequency with which it happens. That's the point.

# ? Aug 18, 2014 20:38

Richard Noggin: Jun 6, 2005; Redneck By Default

Did anyone have issues with Server 2008 R2 VMs running on ESXi hosts with the latest round of Windows Updates? We had two clients experience partial outages - VMs that were patched but didn't reboot correctly - after this latest round.

# ? Aug 18, 2014 23:31

Number19: May 14, 2003; HOCKEY OWNS
FUCK YEAH

It could be related to this: http://www.infoworld.com/t/microsof...-2975331-248582

There were some bad Windows Updates that cause bluescreens that Microsoft had to pull. It seems unlikely that this would affect a server but :shrug:

# ? Aug 18, 2014 23:39

Docjowles: Apr 9, 2009

^^^

There were a whole bunch of issues with the most recent round of Windows Updates, up to and including random crashes/hangs on boot. Could be related.

quote:

Known issue 3
Microsoft is investigating behavior in which systems may crash with a 0x50 Stop error message (bugcheck) after any of the following updates are installed:
2982791 MS14-045: Description of the security update for kernel-mode drivers: August 12, 2014
2970228 Update to support the new currency symbol for the Russian ruble in Windows
2975719 August 2014 update rollup for Windows RT 8.1, Windows 8.1, and Windows Server 2012 R2
2975331 August 2014 update rollup for Windows RT, Windows 8, and Windows Server 2012

http://support.microsoft.com/kb/2982791

# ? Aug 18, 2014 23:41

Richard Noggin: Jun 6, 2005; Redneck By Default

They weren't bluescreened. One was hung, and all showed VMware Tools as stopped, and did not respond to pings. All were remedied with a reboot.

# ? Aug 18, 2014 23:41

Number19: May 14, 2003; HOCKEY OWNS
FUCK YEAH

In that case no, I haven't seen it but I also haven't started my server patches yet either.

# ? Aug 18, 2014 23:47

Hadlock: Nov 9, 2004

Docjowles posted:

Check out Linode and Digital Ocean.

Fantastic, thanks

Misogynist posted:

Seconding the earlier recommendations for Linode and DO, but also check out the hosting megathread, which can generally give you better answers on specific VPS products than we can.

Great, thanks sir, not sure how I missed this

# ? Aug 19, 2014 05:58

madsushi: Apr 19, 2009; Baller.
#essereFerrari

evol262 posted:

My guess is "knows how to interpret SCSI commands", but it's a good question.

The new requirements are a pretty steep queue depth (256?) whereas AHCI maxes at 32. So it actually takes a reasonably good controller now. They had issues with people using cheap ones in prod.

# ? Aug 19, 2014 11:38

Syano: Jul 13, 2005

madsushi posted:

The new requirements are a pretty steep queue depth (256?) whereas AHCI maxes at 32. So it actually takes a reasonably good controller now. They had issues with people using cheap ones in prod.

One of the more famous was a reddit user reporting his prod environment being down for 12 plus hours from trying to sync to a new storage node

# ? Aug 19, 2014 17:41

Bitch Stewie: Dec 17, 2011

mattisacomputer posted:

For actual production use, vSAN is definitely demanding. I think it was in this thread but just a few weeks/months ago where someone had a catastrophic vSAN failure where a SAS controller couldn't keep up with a basic amount of vSAN i/o, even though the SAS controller was specifically on the vSAN HCL anyway. Home lab though? Yeah probably any ol' 6gbps SAS controller will do.

They had a Dell Perc 310 IIRC, and afterwards VMware removed it from the HCL and basically said "We screwed up and put some stuff on there that's not good enough".

# ? Aug 19, 2014 19:35

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Anyone know any common bottlenecks for a P2V using VMware vCenter Converter? I seem to be stuck at like 8MB/s and I really don't want this conversion to take 16 hours.

Source machine is an HP DL380 G5 with 8 cores and 32gb memory. All unnecessary services were stopped. This server is directly connected (1GbE) to the switch stack that the destination host/storage are connected to.

SSL has been disabled within vCenter Converter.

Hosts have 2x10GbE connections and storage has 6x1GbE connection.

Running vCenter converter on a remote machine, as well as a machine local to that cluster made no speed differences.

Dropped a host from vCenter and used the Converter pointed directly at the host, no speed difference.

Currently have a ticket open with VMware and gave them the logs, but I am stumped on where this bottleneck is.

And trust me, I prefer to rebuild machines rather than P2V, but I do not administer this server, and I am guessing it will take them well over a year to actually migrate the services to a new virtual. I am in a crunch to get rid of as many of these physical servers as possible.

# ? Aug 19, 2014 19:49

Richard Noggin: Jun 6, 2005; Redneck By Default

Did you leave the destination disk layout alone? If you change it in any way (including thin provisioning), I've found it takes a lot longer.

# ? Aug 19, 2014 20:11

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Richard Noggin posted:

Did you leave the destination disk layout alone? If you change it in any way (including thin provisioning), I've found it takes a lot longer.

Destination disk layout was left alone.

I was thinking maybe that physical server just had some poo poo slow disks or something, but I did a few smaller machines at that site with the same setup and they all ran at that same speed.

# ? Aug 19, 2014 20:15

Richard Noggin: Jun 6, 2005; Redneck By Default

The other thing that's worked for me in the past is to increase the number of simultaneously cloned volumes, but that may not help depending on your disk layout.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2071014

# ? Aug 19, 2014 20:22

Kachunkachunk: Jun 6, 2011

Marty, is there any chance that the SSDs are not garbage collecting adequately? You mention that this was all fine until recently.
If you were to secure erase the RAID set, you will probably see performance come back just fine - not that it's easy to pull off, I know.
ESXi doesn't support TRIM, so this is perhaps where problems stem from. A suggestion I would have at that point is to set up a NAS and use another US that can support TRIM, and re-export your device to the ESXi servers. Also easier said than done, I know.

Edit: Whoa, is Stupid Newbie gone? Shame, that baby was cool.

Edit 2:

quote:

Machine is a DL380 G7 with an P410i directly connected to 8 SSDs on the chassis backplane. No NAS/SAN in place.

I think it's fairly safe to go by what has been certified on the VMware vSAN HCL if want to avoid asking "should this work?."

To start, you'll unfortunately see that the P410i is not on the HCL, but the P420i is.
Both controllers seem pretty different, following that big generational change from PCI-E 2.0 SAS-6 adapters to the hot, new PCI-E 3.0 ones; the P410i seems to optionally come with battery-backed cache, while the 420i comes with some, to start (expandable/swappable). Further, it's got SmartPath (auto-tiering) support, but that's not really too relevant here.
You do have battery-backed write cache, right? Regardless, I did see you wrote it worked fine before, but make sure it's still working and the battery isn't dead, if you have it.

Something else of note is that the HCL indicates the P420i and some of its ilk need to be in RAID-0 mode, not IT/Passhtru for vSAN. Some other vSAN specific things come to mind with this, though, so I'm unsure about the relevance... such as: http://www.vmware.com/files/pdf/products/vsan/VMware-TMD-Virtual-SAN-Hardware-Guidance.pdf under page 7.
Also you do all that I/O through an external shelf? That's consider a no-no for vSAN. Again, I get that it was working before, but perhaps it now acts as your bottleneck.

Kachunkachunk fucked around with this message at 22:04 on Aug 19, 2014

# ? Aug 19, 2014 20:29

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Richard Noggin posted:

The other thing that's worked for me in the past is to increase the number of simultaneously cloned volumes, but that may not help depending on your disk layout.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2071014

Thanks, didn't find that KB when doing research. I'll muck around with that tonight and give it a shot.

Would it make sense to defrag the physical disks before hand if I am sticking with block based cloning? I would assume it would only help the process?

# ? Aug 19, 2014 20:32

Richard Noggin: Jun 6, 2005; Redneck By Default

I'm not sure on the defrag thing. I can't see that it would hurt anything, but can't say that it would help either.

# ? Aug 19, 2014 20:41

Adbot: ADBOT LOVES YOU

# ? May 9, 2024 16:14

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Richard Noggin posted:

I'm not sure on the defrag thing. I can't see that it would hurt anything, but can't say that it would help either.

Well I'll give it a shot. I am thinking that since this is an older server, that disk is probably fragmented as poo poo. Block based cloning "should" benefit from that.

Edit: Strangely only showing slightly fragmented. Can't hurt to let it defragment though.

Moey fucked around with this message at 21:02 on Aug 19, 2014

# ? Aug 19, 2014 20:46

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »