Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

BangersInMyKnickers posted:

Its a little annoying because there are multiple steps (grow volume, grow lun, grow VMFS), but the biggest issue is SCSI lock contention. Thin provisioned VMDK files grow in size in 1-8mb chunks depending on how you set up the VMFS.

Isn't that VMFS3 only? or am I thinking of something different.

# ? Oct 19, 2012 17:06

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 13:46

Rhymenoserous: May 23, 2008

Ah ok I never run into that because I limit the number of hosts in my datastore by default.

# ? Oct 19, 2012 17:10

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Corvettefisher posted:

Isn't that VMFS3 only? or am I thinking of something different.

I'm still on on 4.2 with VMFS3 datastores so I guess in theory the problem could be minimized in the newer versions, but SCSI locking on new block allocation is very fundamental to how iSCSI luns operate and I don't see how it could be eliminated.

# ? Oct 19, 2012 17:29

KS: Jun 10, 2003; Outrageous Lumpwad

Pretty sure that's one of the things that VAAI has already addressed and eliminated. It does locking at the block level (?) and far quicker using a new scsi command.

# ? Oct 19, 2012 17:31

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

BangersInMyKnickers posted:

I'm still on on 4.2 with VMFS3 datastores so I guess in theory the problem could be minimized in the newer versions, but SCSI locking on new block allocation is very fundamental to how iSCSI luns operate and I don't see how it could be eliminated.

I meant the block growth rate, I forgot to snip that part, and here is a VMFS 5 locking reference incase anyone needs it.
http://pubs.vmware.com/vsphere-50/index.jsp?topic=%2Fcom.vmware.vsphere.storage.doc_50%2FGUID-DE30AAE3-72ED-43BF-95B3-A2B885A713DB.html

Dilbert As FUCK fucked around with this message at 17:36 on Oct 19, 2012

# ? Oct 19, 2012 17:33

BangersInMyKnickers: Nov 3, 2004; I have a thing for courageous dongles

Ha, my NetApp is 8 years old and couldn't run VAAI if its life depended on it. Oh well.

# ? Oct 19, 2012 17:57

Mierdaan: Sep 14, 2004; Pillbug

BangersInMyKnickers posted:

Ha, my NetApp is 8 years old and couldn't run VAAI if its life depended on it. Oh well.

Sup 32-bit netapp controller buddy :hfive:

# ? Oct 19, 2012 18:21

janitorx: May 3, 2002; I'm cuckoo for cocoa cocks!

Rhymenoserous posted:

This sounds weird to me, I've grown iSCSI virtual filesystems fairly often without a hitch. Hell I just bumped one of my datastores a few hours ago. What problems do you guys have?

BangersInMyKnickers covered it well.

We were all FC and 3Par at the time and we never grew anything just setup 2TB LUNs. Everything was below the config maximums for hosts per datastore in 4.0 and still had tons of locking issues due to the dynamic nature of our environment and the fact we were running about 32 hosts per cluster.

NFS is just far easier to deal with overall. If we weren't multi-tenant and our work loads were predictable I think we would have probably stayed FC.

# ? Oct 19, 2012 22:12

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

janitorx posted:

32 hosts per cluster

I'm actually surprised that locking issues were the worst of your problems.

# ? Oct 19, 2012 22:20

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Oh hey some dipshit installed a 250 VDI + 30 servers 10 host cluster on SQL EXPRESS 2008r2, and guess what's happening?

Who is working for 275/hr min 4HRS this weekend to fix it? This guy is. :w00t:

# ? Oct 19, 2012 22:28

skipdogg: Nov 29, 2004; Resident SRT-4 Expert

You'll be getting 12 dollars an hour to fix it right?

Sorry CF, I couldn't resist.

(USER WAS PUT ON PROBATION FOR THIS POST)

# ? Oct 19, 2012 22:34

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

eh nvm time to logoff for the night

Okay I know I said I was logging out because I just cracked my first beer but I don't think that was a trolling, I think that was a light hearted joke.

Logging out NOW

Dilbert As FUCK fucked around with this message at 23:01 on Oct 19, 2012

# ? Oct 19, 2012 22:39

janitorx: May 3, 2002; I'm cuckoo for cocoa cocks!

Misogynist posted:

I'm actually surprised that locking issues were the worst of your problems.

At the end of the day it was honestly, well that and agent failures but that has gotten better with each release. We run that pretty easily with NFS now.

# ? Oct 19, 2012 22:49

Star War Sex Parrot: Oct 2, 2003

# ? Oct 19, 2012 22:54

EconOutlines: Jul 3, 2004

A question on (over) provisioning CPUs and RAM here. Currently using a dedicated server with Xeon E3-1230, and 8GB ECC.

I'm looking to run Server 2012 or 2008 R2 on a steady basis, with the hope of experimenting with other OSes on the side, such as Ubuntu Server, FreeBSD and whatnot. Perhaps a Windows 7 or XP guest as well, but I could do those on Workstation locally if need be.

Any recommendations or literature on this? ESXi manuals are somewhat unhelpful unless I've been looking at the wrong ones.

Edit: The Memory Management Guide seemed like a pretty good start.

EconOutlines fucked around with this message at 16:11 on Oct 20, 2012

# ? Oct 20, 2012 14:38

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

Give the VM's only what they need. For example my lab Domain Concroller is 2008r2 DNS/DHCP/AD/CA only has 1Gb of ram and one vcpu. Normally I will run the VM's around 70-80% Ram utilization if I see it hitting +85% I give it more. Ram is a bit more flexible to over provision though, Transparent page sharing, Ballooning, and Page compression really work wonders for it. You can over provision CPU's a good amount to, however you really should only give the VM what it needs, otherwise you can cause performance problems. in my test environment the only things that have multi cpu's are vCenter servers, connection servers, and SQL and they only have 2 vCPU's

My current lab looks like this FYI
DC - 1vcpu, 1GB ram
vCenter A- 2 vcpu 2.5GB ram(2 isn't enough for 5.1 if you want to do the web client)
vCenter B - 2 vcpu's 2.5GB ram
Connection server A - 2vcpu, 2GB ram
Connection server B - 2vcpu, 2GB ram
Security Server - 1vCPU 1gb ram
SQL - 2 vcpu, 3GB ram
ESXA - 4vCPU, 6GB ram
ESXB - 4vCPU, 6GB ram
ESXC - 4vCPU, 6GB ram

most of the VM's running on ESXI are XP with 1vcpu and 512mb ram, with some Fedora clients 1vcpu, 512 ram, maybe 1 DC 1vcpu 768MB ram

Dilbert As FUCK fucked around with this message at 19:53 on Oct 20, 2012

# ? Oct 20, 2012 19:51

madsushi: Apr 19, 2009; Baller.
#essereFerrari

Roving Reporter posted:

A question on (over) provisioning CPUs and RAM here. Currently using a dedicated server with Xeon E3-1230, and 8GB ECC.

I'm looking to run Server 2012 or 2008 R2 on a steady basis, with the hope of experimenting with other OSes on the side, such as Ubuntu Server, FreeBSD and whatnot. Perhaps a Windows 7 or XP guest as well, but I could do those on Workstation locally if need be.

Any recommendations or literature on this? ESXi manuals are somewhat unhelpful unless I've been looking at the wrong ones.

Edit: The Memory Management Guide seemed like a pretty good start.

Linux VMs: 1 core, 512 MB RAM, 20 GB disk - that's a basic Linode and will be sufficient for 99% of your lab use-cases.

Windows VMs: 1 core, 1 GB RAM, 40 GB disk - this is about as low as you want to go

Try to stack as many roles together as you can. The more vCPUs you provision, the worse your performance is going to be.

Today I am running:

Server 2012 DC - 1 vCPU, 1 GB RAM
Server 2012 Exchange 2013 - 2 vCPU, 2 GB RAM (Exchange 2013 just sucks with 1 vCPU)
Windows 8 VM - 1 vCPU, 1 GB RAM (just poking around with settings/Metro)
Windows 2003 VM - 2 vCPU, 4 GB RAM (Minecraft server)
Ubuntu - 1 vCPU, 512 MB RAM
OnTAP-E - 1 vCPU, 512 MB RAM
CentOS - 1 vCPU, 512 MB RAM

I'm running all of that on a 4-core AMD proc (Phenom X-2 Black?) and 16 GB of RAM. Performance is pretty good.

# ? Oct 21, 2012 09:26

Moey: Oct 22, 2010; I LIKE TO MOVE IT

Corvettefisher posted:

My current lab looks like this FYI
DC - 1vcpu, 1GB ram
vCenter A- 2 vcpu 2.5GB ram(2 isn't enough for 5.1 if you want to do the web client)
vCenter B - 2 vcpu's 2.5GB ram
Connection server A - 2vcpu, 2GB ram
Connection server B - 2vcpu, 2GB ram
Security Server - 1vCPU 1gb ram
SQL - 2 vcpu, 3GB ram
ESXA - 4vCPU, 6GB ram
ESXB - 4vCPU, 6GB ram
ESXC - 4vCPU, 6GB ram

Are you artificially inducing load on these servers to utilize the resources you are giving them, or do you actually have processing going on where they need those resources?

# ? Oct 21, 2012 17:59

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

No not really I still manage to run them at about 60-75% load. The vCenters take up 2.5 or else Web Client just starts loving up, connection servers might be a tad higher than they need to, however I get login issues with 1vcpu and 1.5gb ram. Those ESX servers run view VM's, and some other things. I do have a few cpu/mem stress tests I run sometimes.

A desktop of 32Gb ram helps too of course

# ? Oct 21, 2012 22:36

Pile Of Garbage: May 28, 2007

So I've just remembered an annoying task that I had to occasionally do at my previous employer and I'm wondering if there is a quicker way to do it.

We were running an ESXi 5.0 cluster with storage being provided by an IBM V7000 SAN. The whitepapers and documentation from IBM advised that the recommended pathing policy that should be used by ESXi hosts when accessing LUNs on the V7000 is Round Robin. However whenever a host discovered a new LUN the pathing policy would default to MRU.

This meant that whenever I created new LUNs I had to manually change the pathing policy for that LUN on each host which could see it. Also if I added a new host to the cluster I had to go through each LUN that the new host could see and change the pathing policy.

Doing this via the vSphere Client quickly became a time consuming process so I'm wondering is there an easier way I could have done it?

# ? Oct 22, 2012 04:11

GrandMaster: Aug 15, 2004; laidback

Yep, powercli.
I modified another script that rescanned datastores on all hosts to change all luns to round robin.

usage: lunrr.ps1 -vc {virtualcenter server hostname} -container {cluster name in vc}

#set the input options here as -vc and -cluster param([string]$vc = "vc", [string]$container = "container", [string[]]$vmhosts = $null) #check to make sure we have both function usage() { Write-host -foregroundcolor green `n`t"This script is used to change the disk load balancing policy ro Round Robin for all hosts provided." Write-host -foregroundcolor green `n`t"You can either specify -vmhosts as an array:" write-host -foregroundcolor yellow `n`t`t"Rescan-Storage -vmhosts (`"host1`",`"host2`",`"host3`")" Write-host -foregroundcolor green `n`t"or specify -vc and -container, where container is a host name, cluster, folder, datacenter, etc:" write-host -foregroundcolor yellow `n`t`t"Rescan-Storage -vc vcenterserver -container cluster1" `n write-host -foregroundcolor green `t"You can use either -vmhosts by itself, or -vc and -container together, not a combination of them." `n } function RescanHBA() { foreach ($vmhost in $vmhosts) { if ($esx -eq 1) #do this only if connecting directly to ESX hosts { connect-viserver $vmhost -credential $vmhost_creds > $NULL 2>&1 } Write-Host `n Write-Host -foregroundcolor green "Server: " $vmhost write-host "Change Load Balancing Policies to Round Robin on "$vmhost Get-VMHost -Name $vmhost | Get-ScsiLun -LunType "disk" | where {$_.MultipathPolicy �ne "RoundRobin"} | Set-ScsiLun -MultipathPolicy "RoundRobin" if ($esx -eq 1) #disconnect from the current ESX host before going to the next one { disconnect-viserver -confirm:$false } } write-host `n } #check to make sure we have all the args we need if (($vmhosts -eq $null) -and (($vc -eq "vc") -or ($container -eq "container"))) #if vmhosts, vc, or container is blank { usage break } elseif (($vmhosts -ne $null) -and (($vc -ne "vc") -or ($container -ne "container"))) #if vmhosts and vc or container is used { usage break } elseif (($vmhosts -ne $null) -and (($vc -eq "vc") -or ($container -eq "container"))) #if only vmhosts is used, set our esx variable to 1 and get credentials { $esx = 1 $vmhost_creds = $host.ui.PromptForCredential("ESX/ESXi Credentials Required", "Please enter credentials to log into the ESX/ESXi host.", "", "") RescanHBA } elseif (($vmhosts -eq $null) -and (($vc -ne "vc") -and ($container -ne "container"))) #if vc and container are used, set our vcenter variable to 1, get credentials, and populate vmhosts { $vcenter = 1 $vc_creds = $host.ui.PromptForCredential("vCenter Credentials Required", "Please enter credentials to log into vCenter.", "", "") connect-viserver $vc -credential $vc_creds > $NULL 2>&1 $vmhosts = get-vmhost -location $container | sort name RescanHBA } #garbage collection $vmhost_creds = $null $vc_creds = $null $vmhosts = $null $vc = $null $container = $null $esx = $null $vcenter = $null

# ? Oct 22, 2012 04:27

complex: Sep 16, 2003

Change the PSP for the VMW_SATP_SVC SATP from VMW_PSP_FIXED to VMW_PSP_RR. See http://kb.vmware.com/kb/1017760 This will fix it for all future datastores.

# ? Oct 22, 2012 04:42

Pile Of Garbage: May 28, 2007

GrandMaster posted:

Yep, powercli.
I modified another script that rescanned datastores on all hosts to change all luns to round robin.

complex posted:

Change the PSP for the VMW_SATP_SVC SATP from VMW_PSP_FIXED to VMW_PSP_RR. See http://kb.vmware.com/kb/1017760 This will fix it for all future datastores.

Those are both excellent suggestions! GrandMaster: am I correct in assuming that the PowerCLI script you posted would change the pathing policy for all LUNs on a host?

The cluster we were running also utilised storage on a DS3300 which was connected via iSCSI (The V7000 was connected via FC). The DS3300 only supported the MRU pathing policy so if line 27 of your script was replaced with the following it will only target LUNs on FC HBAs (I think...it's been a while since I've worked with PowerShell/PowerCLI):

code:

$hba = Get-VMHost | Get-VMHostHba -Type FibreChannel
Get-ScsiLun -Hba $hba -LunType "disk" | where {$_.MultipathPolicy �ne "RoundRobin"} | Set-ScsiLun -MultipathPolicy "RoundRobin"

# ? Oct 22, 2012 06:21

GrandMaster: Aug 15, 2004; laidback

It's actually all LUNs on a cluster, I'm not sure if you can point it to a single node. I'm sure it's not too hard to limit the range to a certain array only with a bit of powershell trickery.

# ? Oct 22, 2012 06:39

Pile Of Garbage: May 28, 2007

Pathing policies are set at the host level, not the cluster level. It looks like the script can detect whether it is being run against a single host or a cluster (container) based on the parameters passed to the script.

Also it processes the LUNs one host at a time (That's what the foreach loop in the RescanHBA function is for). It wouldn't be hard to change what it targets, you'd only have to modify line 27.

complex's recommendation of changing the default PSP for a specific SATP is probably the best way to go when deploying hosts however that script is still useful if you ever encounter a misconfigured environment and have to perform bulk changes.

# ? Oct 22, 2012 08:05

Syano: Jul 13, 2005

Walked in this morning to a 3 node hyper-v cluster with 1 node down and all VMs offline/missing. This should not happen! :argh:

Its looking like so far that whatever crashed the first hosts subsequently crashed the clustering service on the remaining 2 hosts. Won't know til I can get the first host, which is now sitting at 'configuring memory', to boot. Thank heavens a reboot of the remaining two hosts resulted in all the VMs auto starting like they are supposed to. What a pisser.

# ? Oct 22, 2012 15:37

Erwin: Feb 17, 2006

How much time skew do you guys see between VMs? I didn't really think much of it, but our programmers have reported seeing up to 10 seconds between different VMs, causing small issues like builds not triggering on continuous integration servers if you quickly check in code in rapid succession. This is a mostly Windows environment, and everything syncs to a DC which syncs itself with an external NTP server. The DC is virtual.

# ? Oct 22, 2012 18:51

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Erwin posted:

How much time skew do you guys see between VMs? I didn't really think much of it, but our programmers have reported seeing up to 10 seconds between different VMs, causing small issues like builds not triggering on continuous integration servers if you quickly check in code in rapid succession. This is a mostly Windows environment, and everything syncs to a DC which syncs itself with an external NTP server. The DC is virtual.

We use group policy to specify windows server time sync should use an NTP server rather than DC. Specifically, a vbscript runs on startup to set specific registry keys. Below is an example of the code for the NTP settings.

code:

Dim Path
Path = "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\TimeProviders\NtpClient\"
set ws = WScript.CreateObject("WScript.Shell")
o=ws.RegWrite(Path & "SpecialPollInterval", "900", "REG_DWORD")
v=ws.RegRead(Path & "SpecialPollInterval")
Set WSHShell = nothing

Path = "HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\W32Time\Parameters\"
set ws = WScript.CreateObject("WScript.Shell")
o=ws.RegWrite(Path & "NtpServer", "time.domain.local", "REG_SZ")
v=ws.RegRead(Path & "NtpServer")
Set WSHShell = nothing

Nothing special, just syncs the time to our time server (time.domain.local) every 15 minutes. Since doing this our time has been much more consistent than it was back in the day when it synced to our DCs. All of our DCs are virtual, and they get their time the same way. Our NTP server is also virtual, but polls it's NTP servers every minute or so.

# ? Oct 22, 2012 21:19

Rhymenoserous: May 23, 2008

Syano posted:

Walked in this morning to a 3 node hyper-v cluster with 1 node down and all VMs offline/missing. This should not happen! Its looking like so far that whatever crashed the first hosts subsequently crashed the clustering service on the remaining 2 hosts. Won't know til I can get the first host, which is now sitting at 'configuring memory', to boot. Thank heavens a reboot of the remaining two hosts resulted in all the VMs auto starting like they are supposed to. What a pisser.

This wouldn't have happened if you ~~owned a gun~~ had vmware :smuggo:

# ? Oct 22, 2012 21:23

Syano: Jul 13, 2005

Rhymenoserous posted:

This wouldn't have happened if you ~~owned a gun~~ had vmware

Eh... maybe. The culprit ended up being a DIMM going bad in one of the nodes. When that happened the node entered a weird state where the mscluster service deadlocked on the shared app resources and wouldn't release. This caused the other 2 nodes to freak out and crash their cluster services. Still should reasonably expect though a node majority to take over when a single node has failed and keep the app resources online.

# ? Oct 22, 2012 21:27

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Rhymenoserous posted:

This wouldn't have happened if you ~~owned a gun~~ had vmware

yeah when weird poo poo happens in VMware it just kills the host management, so you literally cannot do anything. You are then forced to choose between killing the host with all the VMs on it and letting HA do it's thing, or shutting down every VM on the host one by one.

I'm not saying Hyper-V doesn't suck, just that VMware has some show stoppers as well.

# ? Oct 22, 2012 21:28

Rhymenoserous: May 23, 2008

It was a joke.

# ? Oct 22, 2012 21:28

Syano: Jul 13, 2005

Rhymenoserous posted:

It was a joke.

Sorry, my sense of humor flatlined this morning about 6am when I had already gotten my 15th call with someone saying 'HEY DID YOU KNOW MAIL IS NOT WORKING!!'

# ? Oct 22, 2012 21:30

evil_bunnY: Apr 2, 2003

Syano posted:

Eh... maybe. The culprit ended up being a DIMM going bad in one of the nodes. When that happened the node entered a weird state where the mscluster service deadlocked on the shared app resources and wouldn't release. This caused the other 2 nodes to freak out and crash their cluster services. Still should reasonably expect though a node majority to take over when a single node has failed and keep the app resources online.

Bolded the funny.

# ? Oct 22, 2012 21:36

Sickening: Jul 16, 2007; Black summer was the best summer.

evil_bunnY posted:

Bolded the funny.

I am glad I wasn't the only one to notice this.

# ? Oct 22, 2012 21:37

Syano: Jul 13, 2005

Im sure glad someone can laugh at it. I guess I am just thankful there is a hotfix

# ? Oct 22, 2012 21:41

Erwin: Feb 17, 2006

adorai posted:

We use group policy to specify windows server time sync should use an NTP server rather than DC.

Interesting. I have some old physical hardware and plenty of rack space, so maybe I'll just make a physical NTP server so I don't have to worry about it. And actually, I've got one physical DC, so maybe I'll make that the time server (currently it holds no FSMO roles and is really just there just in case).

# ? Oct 23, 2012 00:32

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

I brought this up awhile ago.

Thinking I should make a separate thread on this but anyways. Does anyone have a PoC environment or Test lab for customers? I'm looking to purchase and deploy one, but I was trying to get some input from other people first.

my setup is to be like so
Supermicro FatTwin
>4hosts
>2x8 E5 2603's
>64GB ram per host
>6x1GbE net adapters
This does all the PoC of DPM, DRS, HA, FT, and View so on and so forth.

For vmware storage features
Thinking of just popping NAS4Free/Openfiler on 2 of the Fat Twin servers to hosting the iscsi/NFS target thus making it an all inclusive PoC which isn't too hard to take to a customer site and rack and stack
>>SSD 2x100Gb Datastores
>>10K HDD 4 x 100Gb Data stores
>>7.2k HDD 4 200Gb Data stores

anyone have a PoC, customer playground environment?

# ? Oct 23, 2012 16:28

DevNull: Apr 4, 2007; And sometimes is seen a strange spot in the sky
A human being that was given to fly

So is anyone using or thinking of using the 3D acceleration feature released with ESX5.1? I would be interested in hearing about people's experience with it so far.

# ? Oct 23, 2012 22:06

Adbot: ADBOT LOVES YOU

# ? May 11, 2024 13:46

Dilbert As FUCK: Sep 8, 2007; by Cowcaster; Pillbug

DevNull posted:

So is anyone using or thinking of using the 3D acceleration feature released with ESX5.1? I would be interested in hearing about people's experience with it so far.

Testing it out soonish, probably installing ESXI to my desktop hopefully getting it working.
https://www.youtube.com/watch?v=ME3xaLUTZgU
that was running at vmworld. I have a bunch of people complain "~~youtube~~ training videos are laggy

# ? Oct 23, 2012 22:36

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »