Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›5 »

Kachunkachunk: Jun 6, 2011

RE: Directpath I/O for video cards:

I haven't followed development in this area much, but it gets attention (at least the workstation/server-class GPU hardware does). Last I checked, which was ages ago, you can get Fermi-based nVidia cards working fine, and pretty much any Radeon HDs.
I haven't bothered sticking an expensive Fermi card my own system for testing yet, but I can tell you that a cheap Radeon HD card has worked fine. The problems seem to come up after allocating too much RAM to the VM (something like over 2.8GB of memory). You get BSODs during boot, that sort of nastiness in the Guest/VM.
It seems to me that it has something to do with how the video device BIOS sets up a range of DMAs, which later conflict with what the virtual BIOS and VMware VMX later arrange (or rearrange after power-on). Again, I'd have to look into it more...

Anyway, it works, but it's not without quirks. Every time I rebooted the Guest OS with a Radeon, it would just stop working and required me to fiddle with removing the device and re-adding it (in Device Manager), hoping it'll work on next boot. You had to disable the built-in virtual video device for it to work properly. And well, on next boot, it was using the built-in virtual video device again. My use case was surrounding DLNA servers and transcoding media in real-time. I since then built a media PC box and stopped media streaming (couldn't fast-forward in all codecs/containers, etc. so it was spotty).

I also imagined it would be kind of neat if I could create a gaming VM that had its own allocation of discrete hardware: A pass-through display adapter and sound card, direct USB input for a gaming mouse/keyboard, etc. then I can consolidate quite a of my home's hardware into a beefy workstation. HDMI (over Cat-5 or even IP) already makes remote computing between multiple floors quite doable. Multiple gamers and VMs? Yep, you can.

Ultimately, though, the way VMware would naturally need to do this, is to set up discrete video device hardware as VM shares/resources, so you can check off that 3D flag and have multiple VMs sip from real 3D hardware. It's a little ways out, though.

# ¿ Feb 27, 2012 14:32

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 06:27

Kachunkachunk: Jun 6, 2011

I think you still want the physical switches to be a bit more than the vSwitches and portgroups to account for overhead. And I don't think vmkping accounts for such overhead either, so it might fail by 14 bytes.

You can do, for example:
vmk0: 9000
vSwitch0: 9000
Your physical switch: 9100.

There should be a KB article along those lines; going 100 extra is just a safe ballpark figure.

And if anyone can call bullshit on it, feel free. I'm not terribly strong with my networking.

# ¿ Mar 6, 2012 21:35

Kachunkachunk: Jun 6, 2011

Having a ton of cores is good for fighting thread contention, but you still have shared cache. You'll kind of have to know how the apps behave and use that to decide on whether that will be your next point of contention, or not. And unfortunately it's pretty hard to figure out without real testing (as opposed to reading documentation).

Also I'm really not a fan of AMD's errata and would go with Intel wherever possible.

# ¿ Mar 9, 2012 15:30

Kachunkachunk: Jun 6, 2011

My guess is that you would want to see what the chipset's specifications say about this stuff.

# ¿ Mar 14, 2012 02:53

Kachunkachunk: Jun 6, 2011

Yup, but was that since Nehalem, specifically?
But I don't truly know, honestly. When I was speccing out a couple of Sandy Bridge-based workstations some time ago, the maximum supported memory allowance varied with the few chipsets offered. The options may have offered the same number of sockets and the same kinds of processors and RAM configurations, but they handled different amounts of it. I don't know if it was as simple as having enough slots or not... I doubt it based on earlier experiences from older chipsets. So it really could just be that simple. But possibly not.
Really I figure it would not be entirely surprising if the chipset also had some/significant impact on these kinds of memory bank quirks.

Or poo poo, I could really just be completely wrong. It's either going to be in the CPU or chipset spec, I'm pretty sure? Intel docs are pretty good, if long.

# ¿ Mar 14, 2012 04:51

Kachunkachunk: Jun 6, 2011

Holy cow! I had no idea that's what people go through when working with Veeam.

# ¿ Mar 14, 2012 14:33

Kachunkachunk: Jun 6, 2011

Misogynist posted:

Everyone's had their bad support experiences, and I don't want to color the company unfairly with my impressions -- lots of other people have had fairly decent experiences with Veeam, and I don't want to sway anybody's opinions away from what might be a good solution for them. But the amount of time I'm likely to invest in making it work correctly in our environment far outweighs the cost of switching to another vendor right now.

It's alright, I actually have to remain very much vendor-neutral and avoid colorization. I wouldn't be able to tell someone of your experience (as entertaining/shocking as it may be), even if I wanted to.

With that said, we once had someone flip out on a demanding, but very rude, customer. The tech literally fold him to gently caress off and that he was ungrateful (maybe some other nice things too). He was obviously let go very soon after, but I can't help but wonder if his colleagues still secretly agreed with his sentiments/response.

# ¿ Mar 14, 2012 18:29

Kachunkachunk: Jun 6, 2011

Is it an actual RDM or just the raw LUN itself being presented to a physical SCCM box?

`vmkfstools -i <rdm>.vmdk output.vmdk` is something I'd use if you had an actual RDM on VMware storage. Then go ahead and delete the RDM mappings from the VM configuration and add the VMDKs you just created.

# ¿ Mar 20, 2012 02:26

Kachunkachunk: Jun 6, 2011

Oh boy, you probably want to talk to a consultant, but it sounds like vCenter Heartbeat could do what you're hoping. Though it shouldn't be necessary if you're using SRM to manage two sites, I think.

SRM would handle the LUN remounting/resignaturing and VM registration for you.

I also don't think both sites need the same subnet. Someone will have to confirm (but that kind of thing would be in SRM documentation anyway).

But you probably can't realistically expect FT to be used for everything. There are limitations to it, like how many FT VMs you can run at any one time, and some applications simply won't work properly in FT pairs (vCPU requirements, hardware, etc). Not to mention stuff like snapshots won't work.

Check into FT's caveats before you invest too heavily into it, IMO.

# ¿ Mar 22, 2012 08:47

Kachunkachunk: Jun 6, 2011

It pretty much has no need to swap anything in the hypervisor. Everything is about the VMs, which would swap to their configured .vswp locations.

# ¿ Mar 27, 2012 13:28

Kachunkachunk: Jun 6, 2011

Slight corrections, I had to go read up on it since it's fuzzy. ESXi host swapping happens on the configured scratch partition, if needed. I think it should pick one by default now, but you can configure it. This is also the only way you will have some persistent logging past reboots. It'll be to a datastore, basically.

VMs swap to .vswp files (default is their configuration file location, but you can pick an SSD or specific datastores). Finally, the Guest swaps to its own contained swap file/partition within one or more of its given virtual disks. The .vswp files are used pretty much last. I don't believe that host-side scratch swap is used at all for VMs, aside *maybe* from VM overhead (due to vCPUs, 32 vs 64-bit, etc) and stuff, but even then, probably not. I'm thinking it's just for heap/worlds/processes.

# ¿ Mar 27, 2012 13:40

Kachunkachunk: Jun 6, 2011

IOMMU always seems hard to track down. From some searching, it looks like AMD's 900-series chipsets has it, but you may want to stick with the 970X and 990X variants out there. It could also still depend on whether or not the feature was left enabled on the reference board between different brands.
Someone that knows and uses AMD should hopefully know.

# ¿ Mar 28, 2012 13:16

Kachunkachunk: Jun 6, 2011

Bridged? Well in that case each VM has an IP from your router/gateway which presumably puts out IP addresses via DHCP.
Or you assigned static IPs. Confirm that you can ping from one box to another first (even off of your physical host running VirtualBox).
Port forwarding and the likes is for routers/gateways, you won't need it in your network unless you're crossing networks.

Here's an example:
Desktop running Player, Workstation, or VirtualBox: 192.168.0.2
Router: 192.168.0.1
Random desktop in your house: 192.168.0.3
VM 1: 192.168.0.200 (HTTP)
VM 2: 192.168.0.201 (Ventrilo, Samba)
VM 3: 192.168.0.202 (SQL)
Your friend, across the Internet: 24.x.x.x

Everything in 192.168.0.x can talk to each other if you're really using Bridged networking for the hypervisor/app you're using. The Guests/VMs don't need any special arrangements (just consider maybe firewalls and the likes maybe stopping stuff from working).
The only place you need port forwarding is at the router level (so 192.168.0.1), if you need 24.x.x.x to be able to reach a service in your network.

Anyway VirtualBox I think I've heard has a more responsive desktop experience, but perhaps VMware Player could be an alternative for you.

Edit: More specific examples added. If you wanted to reach the HTTP server, you literally open up 192.168.0.200 in your browser. Your friend needs to enter your public IP address that your modem has, and your gateway/router needs to be forwarding that port (80 by default) to 192.168.0.200. That's a NAT function. If there's a separate firewall to think about, you have to also permit the port (default, port 80).

Kachunkachunk fucked around with this message at 13:34 on Mar 30, 2012

# ¿ Mar 30, 2012 13:31

Kachunkachunk: Jun 6, 2011

My Rhythmic Crotch posted:

Hi Kachunkachunk, the problem doesn't seem to be so simple. I can verify that (for example) apache is running from within the guest, and I can ping the guest from the host, but I can't load a page hosted by the guest from the host. Virtualbox does indeed have the concept of "port forwarding" but only in NAT mode and not bridged mode. So it would seem that I need to define some combination of NAT and bridged network adapters, but I just haven't found the working combination when others have reported that it works.

It sounds like networking is fine, then. It could be a firewall in the Guest or perhaps Apache is not listening on all addresses. What Guest OS is it?

FISHMANPET posted:

From what I can find (mainly the VMware Setup for Failover Clustering and Microsoft Cluster Service guide) iSCSI is not supported as a protocol for the clustered disks. It says that in the 5.0, 4.1, and 4.0 guide. However I've gotten it working in 4.0 following the instructions in the 4.0 guide, except with an iSCSI disk instead of FC.

Specifically where it says it doesn't support iSCSI:

So, any insight on what "not supported" means? Does that mean VMware tech support won't support the configuration, even though it works perfectly fine?

Basically VMware and Microsoft have not fully validated/tested for this particular use case, from my understanding. Just fiber. But I quite honestly thought that hardware iSCSI was in fact supported by now. I haven't really checked in a long time. Anyway VMware Support performs best-effort, to the least. They just can't file bugs or guarantee anything for your MSCS VMs if anything goes awry. If the configuration seems to be related to the issue, they will ask you to revert/change it.

adorai posted:

would I see a significant performance difference on bulldozer processors running 5.0 rather than 4.1u2?

I haven't heard anything about Bulldozers gaining or losing performance between versions, really. Was there anything in particular that made you wonder (such as existing performance hits on 4.x for any reason at all)?

# ¿ Mar 31, 2012 00:51

Kachunkachunk: Jun 6, 2011

Is it heavily-dependent on L3 cache performance? The more cores a processor has, the more threads (vCPUs) that will be sharing that cache.
I remember that some desktop/productivity apps were performing terribly as a result in some environments, prompting some requirements with tweaking DRS rules.

# ¿ Mar 31, 2012 04:18

Kachunkachunk: Jun 6, 2011

My Rhythmic Crotch posted:

Hi Kachunkachunk, the problem doesn't seem to be so simple. I can verify that (for example) apache is running from within the guest, and I can ping the guest from the host, but I can't load a page hosted by the guest from the host. Virtualbox does indeed have the concept of "port forwarding" but only in NAT mode and not bridged mode. So it would seem that I need to define some combination of NAT and bridged network adapters, but I just haven't found the working combination when others have reported that it works.

Hey again, I was looking at a problem VM in my environment (I can SSH to everything else with port redirection over WAN, but not to this specific VM, despite good rules).
Turns out it's because the SSH server in this Guest is replying over the wrong WAN interface (the VM has a VPN as the default gateway). Thus I can't SSH to it. Maybe you have a different default gateway? But then again if you have a flat network and aren't routing/NATing anything, it's probably not your issue.

# ¿ Mar 31, 2012 17:26

Kachunkachunk: Jun 6, 2011

Martytoof posted:

Crossposting from the IT Certification thread, but someone here works for VMware, right? This is pretty much the biggest pie in the sky request, but if you could, please put a bug in someone's ear that a poorly organized hard to decipher Google Docs (inaccurate) spreadsheet linked from vmware.com is a terrible drat way to get information about schools that teach a VCP curriculum out to prospective students

I have no idea what this is about a spreadsheet, but try this:
http://mylearn.vmware.com/portals/w...yID,hostID,tzID

You get specifics like this:
http://mylearn.vmware.com/mgrreg/courses.cfm?ui=www_edu&a=det&id_course=121821

To earn a VCP it depends on what you already know or have (you could possibly do a fast-track/what's new course), or you can do the whole Install, Configure Manage dealio.

Edit: Unfortunately it looks like most, or all, of the training is downtown Toronto, not local to the Hamilton/Burlington/Oakville areas.

# ¿ May 1, 2012 15:07

Kachunkachunk: Jun 6, 2011

Daylen Drazzi posted:

Got an interesting issue with my roommate's box and hope someone can help resolve it.

He set up ESXi on a brand new computer with 4 hyper-threaded cores and 32 GB of RAM and a couple TB of storage. Everything seems to work fine except for one little thing - when I decided to try to install Windows Server 2008 R2 it causes the whole thing to lock up and he has to power-cycle the box. I've locked his machine up three times yesterday and twice today, and each time all I did was try to install Windows Server 2008 R2 on a brand new VM.

He's got a couple Linux VM's running that are completely stable, and I did manage to install a single Server 2008 R2 VM (after killing the ESXi box a couple more times), but my roommate has forbidden me from trying to install a second VM until we figure out what's going on.

I've tried looking for some possible reasons, but nothing seems to be showing up on Google. One possible suggestion is that the SCSI controller might be an issue, which I have set to LSI Logic SAS on the working VM. I set the second VM to that for the initial install (and killed the ESXi box) before changing it to Paravirtual (and killing the ESXi box) on my second attempt - that's when my roommate forbade me from going any further.

Sort of at a standstill, which sucks because I had a nice stable environment going using Oracle's VirtualBox, but I wanted to use a VM environment with far more real-world exposure like VMWare.

Box lockups are a toss-up between hardware and software issues, really. If there's one thing I learned, it was that ESX/ESXi was excellent at exposing hardware problems (architecture/design or physical issues), firmware issues, and errata.
Generally speaking, ESX/ESXi does not hang; it should time out somewhere and eventually purple-screen.
Complete hard-lockups are treated like hardware issues, usually, but I find it curious you can reproduce it. Probably now it'd be a possible erratum (hence recommending BIOS/firmware upgrades).

What you can try to do is find out if you can NMI the system, and to do it while the host is hung. Firing an NMI on ESXi 5 (as long as that interrupt vector is enabled) will cause it to purple-screen, indicating it was not truly hung, but deadlocked somewhere in the hypervisor. Earlier products had to be configured to respond in this way.

Putting myself in your buddy's shoes, it's not a nice outlook either way. I'm hopeful the BIOS/Firmware upgrades will do it, but if not, you might need to consider turning off virtualization options like vt-d.
If he has other performance related options in the BIOS, they may need to be tweaked.

CPU states (C/Sleep) should be disabled when troubleshooting anomalous behavior like that, as well as setting the performance profile to maximum. How much RAM is that VM booting with, anyway? And how many CPUs? At what stage exactly was the Windows install locking up the ESXi box?

Edit:
Kind of glossed over the fact that you had at least one working W2K8 R2 VM. Is there anything different between the two VMs' configurations?
And just to be sure, is the system actually locked up? If your buddy is willing to reproduce it oen more time, try hitting alt+f12 on the console, then spin up another VM to hang it, and see if the activity on the screen stops, as well as Numlock, etc.
See if that last activity on the screen shows anything useful (take a picture and post it if you want). It's near real-time logging from the host's syslog daemon. Might just be vmkernel components, I can't remember - but it's all that mattered here anyway.

Kachunkachunk fucked around with this message at 23:10 on May 8, 2012

# ¿ May 8, 2012 23:03

Kachunkachunk: Jun 6, 2011

The RAM speeds probably shouldn't cause problems there, but at least on that topic, I also had 16GB of the stuff not show up with the right speed on my recent build. Reconfiguring the BIOS to run it at its recommended speed and timings has shown no problems so far (more than a couple of weeks of full VM load already).

Also in my post regarding turning off options - that's all just during troubleshooting. You'll basically only find certain features or components that seem to stress to the point of failure when used.

In the order of what you can sacrifice, it's power saving, then virtualization extensions/capabilities. You *need* virtualization technology to be set, but vt-d is a [very] nice-to-have and not critical. I do however make it a necessity for my builds as I like to use directpath I/O.

If you're really just looking to learn, you can still run a few ESXi hosts in VMware Workstation, if you like. To learn/repro some issues with Auto-Deploy and Host Profiles, I built a cluster of ESXi VMs on my laptop (16GB of RAM), and it was no problem at all. vCenter's the real hog, if anything. This plan of course requires you to have Workstation (or to have a trial/filez copy).

Also sounds like you're both roomies. He doesn't actually work at VMware, does he? It's a strangely common living situation between a handful of employees from Support there, heh.

# ¿ May 9, 2012 05:37

Kachunkachunk: Jun 6, 2011

MC Fruit Stripe posted:

My media server finally poo poo the bed, and I am not willing to put any work into fixing it. I've decided to virtualize it, but the server runs a bunch of hard drives. I want to build a VM to replace it, and I want only that VM to recognize the extra hard drives. I want every other VM on the physical host to remain blissfully unaware of the extra drives. I won't even ask if it's possible - I only need to know how!

VMware Workstation 8.0, naturally. Any help, much appreciated.

Add a Disk -> Use a physical disk (not partition).
If you're not running as a root/admin user, you won't be able to do this.

For Windows, identifying what disk is what can be done via diskpart or looking at the Disk Mangler and looking at the order there.

Edit: Auto-mounting is something you probably need to disable on Windows or Linux, whatever you're running. Diskpart has an automount option that will assign drive letters to anything recognized by Windows. You can consider removing the drive letter on your physical Windows install if you want. For Linux, just stop it from automounting however way your distro recommends or is equipped for.

Whatever you do, don't access a non-clustered filesystem from multiple "hosts" at the same time.

Kachunkachunk fucked around with this message at 13:09 on May 14, 2012

# ¿ May 14, 2012 13:06

Kachunkachunk: Jun 6, 2011

I can certainly vouch for the Dell Precision T5400 workstations, but it still gets costly for a decent amount of ECC RAM and a pair of processors. You want lots of threads generally but there are some edge cases involving cache performance being the bottleneck (at which point 'bulldozer' style core cramming is pointless).

# ¿ May 15, 2012 06:09

Kachunkachunk: Jun 6, 2011

Mierdaan posted:

Good to know this is still a bug in vCenter 5 + ESXi 5. Yay 4-year-old bugs!

I am pretty versed on how locking works on VMFS3, and to some extent, VMFS5. Mind elaborating on what you ran into recently? Are you still hitting the issue?
Also were you using VMFS3 or VMFS5? And with VAAI atomic locking, or SCSI reservations? Or was it NFS?

The underlying causes for "device or resource busy" actually can vary. Sometimes it's a COS process (for ESX, not ESXi), found via `lsof | grep <filename>`. Sometimes it's a corrupt VMFS heartbeat lock record. Sometimes it's a stale/runaway cartel/child world/process (would be found in VMkernel land, which can be tricky for ESX users. [Well-versed] ESXi users can use vsi shell). And finally sometimes it could be from the host agents or some third-party agents. Happens, but the causes vary even more.

It all comes down to any of the following for block/VMFS storage:
1) A legitimate process opened the file with read or read-write locking.
2) A legitimate process (or cartel/child) had the file open with read or read-write locking but its parent is gone (VM outage, something else weird).
3) There is corruption of the Heartbeat region on VMFS. Happens from storage issues and storage array bugs, but there have been a handful of rare ESX/ESXi bugs.

NFS, there's NFS-side FS issues, permissions, and .lock directories involved. A bit simpler. ESX doesn't control locking on NFS.

Kachunkachunk fucked around with this message at 01:24 on May 23, 2012

# ¿ May 23, 2012 01:21

Kachunkachunk: Jun 6, 2011

Ah okay! Thanks for explaining - pretty clear to me.
So it's not quite a locking issue in itself, but template customization looks completely broken there.
Well, that's just disappointing. Especially if it hasn't been resolved since as early as June 2008. Do you know how many releases of VirtualCenter and vCenter Server have come out since then? PM me if you have a case number, please. That poo poo is bananas.

# ¿ May 23, 2012 02:31

Kachunkachunk: Jun 6, 2011

Mierdaan posted:

The Communities thread I linked says he experienced it on 3.5, and I can reproduce on 5.0 so I'm guessing this is the 4th revision (3.5, 4.0, 4.1, 5.0?) with this particular bug. I don't have a case number because I assumed a 4-year-old bug would already have a million cases open for it - I mean, I actually hit this bug in production on 4.1 and then tried replicating for fun on vmeduc's 5.0 since my systems aren't there yet, someone else must have filed a bug, right!?

Shame on me, I am part of the problem I will be a good user and open a well-explained bug report tomorrow and PM you the case number.

It was a retorical question, but there have also been all those roll-up releases (Update-1 through 6 for VirtualCenter 2.5 alone), then Update-releases for ESX/ESXi 3.5 through 5.0 (indeed four Major versions you've mentioned), but the bug probably sits in some ancillary functions of VC or VC-related agents (vpxa/hostd).

Chances are pretty good that a bug was filed quite some time ago, but if the KBs are lacking in acknowledging it publicly, I have a feeling it could still be slipping by each day. Thanks in advance.

# ¿ May 23, 2012 03:22

Kachunkachunk: Jun 6, 2011

Apathy in regular Support as well, it seems.
Like I mentioned in the PM to you, I will look into it tomorrow. I'm more interested in seeing if there is actually a bug filed, because I sure a poo poo know that in your case, the TSE didn't bother sourcing or filing one.
Most of them really are too busy and have to prioritize other issues, but I don't really agree with the way it was handled. But I also operate on a team that's a lot less break-fixy and a bit more solutions-oriented.

If it means anything, VMware's feature request folks are quite receptive and I have seen first-hand a product manager working directly with customers and inviting them for beta products, etc.

Otherwise, as with any vendor, it depends on how loud you want to scream, really.

# ¿ May 24, 2012 05:42

Kachunkachunk: Jun 6, 2011

Yeah, those are good points. And sorry for the thread-jacking, everyone.

# ¿ May 24, 2012 15:22

Kachunkachunk: Jun 6, 2011

By Epping? Yeah, better get it.

# ¿ May 24, 2012 23:58

Kachunkachunk: Jun 6, 2011

Dude, gently caress Joffrey.
(I hope that's not an important VM or anything. HATE)

# ¿ May 28, 2012 04:47

Kachunkachunk: Jun 6, 2011

DirectPath I/O passthrough of a videocard is the only way to get native PCI-E GPU performance in a VM, but it's not a painless affair to get working. You'll need a monitor attached to the videocard, VT-D support (if you're Intel), and a very modern GPU that will work with this configuration. Some of them don't take kindly to memory holes and mapping that occurs for VM operation (at least on VMware).

This is definitely an area that will be developed more in ESXi. Even vmotion might be possible later if each server has the same GPU. It's just more processor and memory to allocate/reserve/share/balloon, etc.

Kachunkachunk fucked around with this message at 00:35 on Jun 2, 2012

# ¿ Jun 2, 2012 00:33

Kachunkachunk: Jun 6, 2011

If you virtualize all your DCs, you could consider ensuring that it's pinned to one or two boxes (via DRS rules). This helps you quickly locate them and start them up in the event of a datacenter outage. You can do the same with VC, really.

Now, for this handful of select ESX/ESXi boxes, you probably want to turn off AD authentication and rely on local, too.

# ¿ Jun 14, 2012 23:10

Kachunkachunk: Jun 6, 2011

I think domain authentication of ESXi boxes is purely for tracking and ease of credentials management, really. If you fire a user, you don't have to change all the server passwords, for example.

Not saying I like AD authentication of ESXi, though.

# ¿ Jun 15, 2012 05:46

Kachunkachunk: Jun 6, 2011

I don't think so. It pretty much is all up to whatever the VC box is installed on and/or the domain that the system was attached to. Then for individual ESXi boxes and their logins, it's either Local, AD/LDAP.

Edit: To be more specific, VC doesn't appear to have a way of managing local users on an ESXi box. The vSphere Client, directly connected to the host, can. I suppose this was just never part of VC's design (or intent).

# ¿ Jun 15, 2012 05:58

Kachunkachunk: Jun 6, 2011

If you're rooting around in esxtop and suspecting disk related performance issues, check 'd' mode and add the latency/cmd columns ("f" for column editing, and "j" for latency/cmd).

Follow a similar process in modes U and V. Press "h" at any time for help.

Also remember the sampling period is 5 seconds.

# ¿ Jun 25, 2012 23:59

Kachunkachunk: Jun 6, 2011

Yellow Bricks has some good guides on interpreting and using esxtop metrics, as well - be sure to print off a copy of that too: http://www.yellow-bricks.com/esxtop/.

Also... FFFFFFFFFFFUUUU-
Okay, so the Linux kernel bug detailed here may likely affect anyone running ESX 4.0 U1 and earlier, later this week. ESXi users, or at least anyone running patch-6 for ESX 4.0 and later should be fine, however.

In case some of you do not already know, ESX "Classic" runs with a Console OS, which is basically a repurposed derivative of RHEL 3 or 4... along with all of the applicable bugs and quirks that may accompany such a release (resolved with patches).

If Red Hat bug 479765 triggers on an ESX box's Console OS, it will probably result in a Lost Heartbeat purple screen eventually (vmkernel/hypervisor continues to run but notices the Console OS has stopped responding, then panics).

Patch up by Saturday if you're running such an old release.

# ¿ Jun 26, 2012 22:10

Kachunkachunk: Jun 6, 2011

Specifically known as "P06," which is only pissing me off, since I also have trouble figuring out exactly what release level that is.
Rest assured, it's probably still something that predates Update-2, even. You'll surpass that with a regular update rollout.

Update-x is a roll-up package, like a Service Pack. It may also include other specific fixes and stuff. P0x would be a specific minor rollup that looks like an individual patch. Installing Update-4 will net you all the benefits and fixes from prior updates/releases cumulatively.

Edit: I think this is the relevant kernel update for ESX 4 that one would need: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1013127, otherwise known as ESX400-201005001.
Also see: http://scientificlinuxforum.org/index.php?s=2c11935c7c49be320ff30d9a09376a6e&showtopic=1695&st=0&#entry11777 and/or http://blog.toracat.org/2012/06/leap-seconds-who-cares/ for some information on what kernel revision needs to be met or exceeded.

Edit2: I am confident that the above patch resolves the issue. You don't have to get to 4.1 right now.

Kachunkachunk fucked around with this message at 22:45 on Jun 26, 2012

# ¿ Jun 26, 2012 22:38

Kachunkachunk: Jun 6, 2011

No vCenter? Okay, so you'll probably be working on this in a maintenance window or something.

Get Update-4 for 4.0 and interactively install it on each box using the command-line. Won't take you very long for each host, but it will require rebooting and downtime (especially since you don't have vMotion without vCenter?).

If you do go to 4.1 and later, you generally do not need to update your existing install before it upgrades. Otherwise the worst case is to install over it and retain your local VMFS partition (in case you have VMs there).

# ¿ Jun 26, 2012 22:49

Kachunkachunk: Jun 6, 2011

Do you storage and system types see any information out there recommending (one way or another) settings for prefetching behavior at all?

From my understanding, system-side prefectching (memory, etc) is pretty much something to avoid in an ESXi box, and I would imagine it's the same for storage when I/O patterns begin to look almost completely random (i.e. from a bunch of ESXi hosts).

# ¿ Jul 3, 2012 02:30

Kachunkachunk: Jun 6, 2011

stubblyhead posted:

No, this was about two months ago so not a leap second thing. It wasn't an out of memory error. The actual error code was EXCEPTION_ACCESS_VIOLATION. If this is a bad memory error (I am almost certain it isn't), wouldn't it take a memory test on the esx host itself to detect?

Testing memory within a VM does weird stuff.
You know how DMA works? Basically a memory tester will read/write to all addresses. Real hardware will just ignore it usually, but virtual hardware will programmatically care about what the hell you're doing to various addresses. E.g. writing/reading data to the SCSI controller or NIC card's mapped addresses.
The VM Monitor will sometimes catch it and crash the VM. Depends on what kind of test and data is being written, and where, for that VM. Sane operation of hardware in an OS with drivers will not let weird things be written to hardware addresses, really.

Anyway as far as finding bad memory goes, it's indeed somewhat viable to run it in a VM, in concept. There's still a real memory space given to the VM, and it still uses one or more real processors. But you're really still best off doing it on the metal and giving that testing software everything to play with.

Anyway, I think they're making a bad call on it being memory. If you ran out of memory, then you had a memory leak somewhere in the Guest or it's just starved due to an unprecedented workload. Since it's a Windows box, you'll have to check task manager and performance monitor occasionally to see if the application (or something else) is consuming way too much memory.

Kachunkachunk fucked around with this message at 22:28 on Jul 9, 2012

# ¿ Jul 9, 2012 22:25

Kachunkachunk: Jun 6, 2011

I passed through my LSI controller to a VM and use that as a storage server, personally.

Note that this is a home lab thing. If you had to learn anything that looks like a larger setup, a decent modern desktop could have upwards of 16GB to 32GB of memory and at least 8 independent threads. You can virtualize some ESXi hosts with vCenter in a Hosted solution like Fusion, Workstation, etc.

Plus I tend to have this fallback thing where I turn old gaming desktops into servers, so every desktop gets maxed out with RAM in case it has to turn into another ESXi node, before RAM prices climb due to rarity.

With that said, however, I still want to consolidate everything into one larger box at some point and donate/sell the old hardware.

# ¿ Jul 15, 2012 17:38

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 06:27

Kachunkachunk: Jun 6, 2011

Corvettefisher posted:

Used it in a lab, 5.1 looks more promising IMO but it is nice to see them working away from the windows depend platform.

Still no view agent for unix though!

vmware-open-view? :shrug:

Fake Edit: Or whaaa? http://blogs.vmware.com/euc/2012/05/new-view-clients-optimized-for-vmware-view-51-now-available-on-windows-linux-mac-ipad-and-android.html

# ¿ Aug 1, 2012 21:16

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›5 »