Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›5 »

Pile Of Garbage: May 28, 2007

At the last place I was at we had a PowerCLI script that would run daily and send out report of all snapshots. I'm pretty sure whoever set it up just found the script somewhere online but it was good to have alongside alarms.

# ¿ Aug 3, 2018 01:56

Adbot: ADBOT LOVES YOU

# ¿ May 18, 2024 04:23

Pile Of Garbage: May 28, 2007

Pretty sure your hypervisor boot volume shouldn't be seeing significant amounts of write. You probably just got lovely flash.

Edit: I recently saw 1/5 failure rates on cards for Cisco UCS blades the year before last so don't exclude hardware failure.

# ¿ Aug 19, 2018 18:07

Pile Of Garbage: May 28, 2007

I miss working with FC, zoning was neat and fun

# ¿ Jul 17, 2019 01:59

Pile Of Garbage: May 28, 2007

CommieGIR posted:

Just FYI: If you notice sluggishness while running HyperV on your host, you may need to disable hyperthreading on your machine.

Is that still an issue on the latest Win10/Server 2019 builds?

# ¿ Sep 22, 2019 10:14

Pile Of Garbage: May 28, 2007

Actuarial Fables posted:

I'm trying to get iscsi multipath working the way I would like to on my Linux host (proxmox), one host to one storage device. In Windows you can configure Round Robin w/ Subset to create a primary group and a standby group should the primary fail entirely - how would one create a similar setup under Linux? I've been able to create a primary group w/ 4 paths and that works fine, but I can't figure out how one would add in a "don't use this unless everything else has failed" path.

I'm not familiar with proxmox but is there a specific reason why you need iSCSI Multipath and can't just rely on link aggregation?

# ¿ Jan 13, 2020 16:26

Pile Of Garbage: May 28, 2007

Actuarial Fables posted:

I'm mostly just trying to do stupid things in my lab so that I can understand things better.

Nice, I can get on-board with that (And explains why I've got such expensive poo poo in my home network). What SAN are you using and does it present multiple target IPs?

# ¿ Jan 13, 2020 18:05

Pile Of Garbage: May 28, 2007

IMO it'd be far easier to attach the laptop to your network on a separate VLAN and restrict connectivity on that side instead of mucking around on the host.

# ¿ Feb 14, 2020 17:46

Pile Of Garbage: May 28, 2007

Potato Salad posted:

everyone's lacp implementation is awful, from "why do i have 1/n packet loss for n links" to literally unusable

Care to elaborate? I ask because I've never once had issues configuring LACP between devices of any vendor from 2x1Gb up to 4x10Gb.

# ¿ May 9, 2020 22:18

Pile Of Garbage: May 28, 2007

Combat Pretzel posted:

I think I found it. In the Group Policy editor in the VM, RDS -> Remote Desktop Session Host -> Remote Session Environment -> Configure RemoteFX Adaptive Graphics to Lossless.

I mean, I'm using RDP over VMBus, either via VMConnect or a carefully crafted .rdp file, so I couldn't care less about compression.

--edit:
Heh, it even stutters less in lossless mode.

Make sure you're allowing both tcp/3389 and udp/3389. RDP will use both and prefers the latter for better performance.

# ¿ Jul 23, 2020 04:12

Pile Of Garbage: May 28, 2007

Just a guess but I'd say it would come down to the PCI bus itself and how it presents devices attached to it. I'd assume there's some kinda standard for that poo poo so it shouldn't matter how the devices behind the bus are connected and as long as the bus supports passthrough it would work in theory except if the hardware does DMA or some poo poo who loving knows love to know how you get on.

# ¿ Apr 30, 2021 16:40

Pile Of Garbage: May 28, 2007

Internet Explorer posted:

I have a Synology NAS that I run some containers on. I just really don't need a home lab these days.

Do I need to hand in my nerd card? Am I getting old?

I just got a new big beefy QNAP NAS and am probably going to transition to just running containers on it. I've got an IBM x3550 M2 with dual Xeon X5570 CPUs, 128GB RAM and four 10k SAS HDDs in RAID5 that I've been running ESXi 6.5 on for a while now. It's pretty power hungry and I can't upgrade to any newer version of ESXi because they dropped support for those CPUs. Also I think the RAID controller has started failing, storage is getting flaky. Time to put it out to pasture.

# ¿ Aug 20, 2021 04:24

Pile Of Garbage: May 28, 2007

I recall maybe a decade ago now commissioning some new IBM HS22 blades running whatever was the latest ESXi at the time and they had an issue where their DAS would just offline rendering the host unmanageable (VMs would keep running but you couldn't manage or migrate them). Back then the fix we settled on was to just order some of the special IBM SD cards to use as the boot device on the blades instead of the DAS.

Weird that it sounds like the reverse is now the issue lol.

# ¿ Aug 20, 2021 18:10

Pile Of Garbage: May 28, 2007

diremonk posted:

Here's the results of a ls -a on the host. Strange that it is saying that there is no .vmx~ file. Sorry it is kind of small, remoted into my work desktop to grab the screenshot..

Maybe the issue is with the permissions on the file. Check with ls -alF, not sure what the correct permissions are meant to be though.

# ¿ Sep 10, 2021 04:57

Pile Of Garbage: May 28, 2007

diremonk posted:

OK, moved it to another directory and it starts booting. But then starts having issues with the file system.

Maybe I should just nuke it and start over, then do a proper snapshot and backup once I have it all working.

This sounds like multiple different issues possibly related to the storage. If you're also having issues with other VMs on the same storage then you may have a problem.

Otherwise I'd recommend what Arishtat suggested:

Arishtat posted:

Worst case scenario if the VMX is hosed is you can create a new shell VM and attach the existing VMDK to it. For safety sake I�d actually copy the VMDK to the new VM folder so you have the untouched original as a backup until you get everything working again.

# ¿ Sep 11, 2021 17:24

Pile Of Garbage: May 28, 2007

Thanks Ants posted:

VMware are discontinuing support for running the hypervisor from only SD cards and USB sticks. You'll need good local disk as well, at which point the SD card adds no value.

https://blogs.vmware.com/vsphere/2021/09/esxi-7-boot-media-consideration-vmware-technical-guidance.html

I assume most servers these days come with M.2 NVMe SSD sockets so should be simple enough to use those instead whilst still avoiding the need for a SATA/SAS drive.

# ¿ Oct 4, 2021 12:49

Pile Of Garbage: May 28, 2007

BlankSystemDaemon posted:

You'll be hard pressed to find server boards without any SATADOM and 2 of them is by far the norm, for the exact reason that it's the easiest way to add two SSDs in a mirror for ESXi or another type1 hypervisor to boot from.
ESXi is designed to load its OS into memory and run it entirely from there, only writing to the disks when configuration is changed and saved, so there's hardly any point in using NVMe.
The marginal boot time gains aren't going to mean anything, because any company who's serious about nines will be doing live migration.

Not sure if it's an issue any more but I remember back on 5.5 and maybe 6.0 if the ESXi root file system broke then it also breaks poo poo like vMotion.

# ¿ Oct 4, 2021 14:18

Pile Of Garbage: May 28, 2007

With VMware is there a way to exempt a VM from DRS but at the VM level instead of the cluster? I know you can do it at the cluster level but is there a tag or something I can set on a VM that will make DRS ignore it?

I'm building VM templates with Packer and as part of the build it mounts a floppy to the VM containing the autounattend.xml and some scripts. The issue is that if the VM is migrated via vMotion the floppy drive gets automatically disconnected from the VM. This either causes the build to stall or only complete partially due to missing scripts. This issue seems to arise fairly frequently because DRS appears to be hyper-aggressive on clusters I'm deploying to despite being configured with default settings.

# ¿ Apr 15, 2022 03:46

Pile Of Garbage: May 28, 2007

Internet Explorer posted:

Yes. Old UI, but should be fairly similar.
https://www.yellow-bricks.com/2018/03/28/disable-drs-for-a-vm/

That's still at the cluster level though and requires that the VM already exist. I'm deploying new VMs with Packer using the vsphere-iso builder which doesn't exactly give you options to mess with things in the environment other than the VM it is deploying.

Pikehead posted:

You could also put the floppy image on a shared datastore or library, so vmotioning isn't an issue.

(I believe this will work - I haven't had to mount a floppy image in .. well, ever).

Packer creates the .flp file on the fly and uploads it to the same datastore as the VM so it's already on shared storage. From what I observed the floppy image isn't being unmounted but rather the floppy drive is being disconnected from the VM when it vMotions. As part of the build Packer also mounts two ISOs to the VM on separate CD-ROM drives and those are unaffected by the vMotion.

Cyks posted:

In vcenter you set it up as an affinity rule.

https://docs.vmware.com/en/VMware-vSphere/7.0/com.vmware.vsphere.resmgmt.doc/GUID-FF28F29C-8B67-4EFF-A2EF-63B3537E6934.html

Specifically VM to Host group affinity rule.
Actually just doing manual for the individual vm as the other poster said is easier I�m overthinking it too early in the morning.

Again that would require configuration at the cluster level and can't be done prior to the creation of the VM. I'd really like to avoid having to orchestrate something alongside Packer to do this.

Maybe I should just figure out what's going on with DRS. I've seen it vMotion a new VM three times in the space of two minutes basically immediately after the VM was created. Surely that shouldn't happen as Packer is just pointing at the cluster and relying on DRS initial placement to choose a host. Perhaps that initial placement recommendation from DRS is cooked?

# ¿ Apr 15, 2022 16:30

Pile Of Garbage: May 28, 2007

Pikehead posted:

If the floppy is on the same datastore as the vm then I wouldn't expect issues, but obviously there are. I can't think why it would disconnect - would there be anything in the esxi logs as to what's going on?

I'll have a look at the logs if any when I'm back at work next week. It is quite strange as normally I'd expect it to cause vMotion to fail outright.

Pikehead posted:

With regard to a very aggressive DRS - that's again a bit unexpected - I thought DRS by default runs every 5 or 15 minutes and not at the frequency you're seeing. Is the vms being powered on each time? - That would possibly make sense, as when DRS is fully automatic it's involved each time a vm is powered on.

For the Packer build the VM is only powered-on the one time. In the guest OS it does an unattended install of Windows Server 2019 so there are a couple of soft reboots and then at the end it powers off so that Packer can emove the CD-ROM and floppy drives and convert it into a template.

# ¿ Apr 16, 2022 00:12

Pile Of Garbage: May 28, 2007

Pikehead posted:

I would think that vMotion would work - it's a shared datastore. There's something I'm not getting or something not right here though.

On leave at the moment so can't test.

Yeah I thought the same as well and was just ruminating on the reason why. Like perhaps when it goes to vMotion the VM it sees the mounted floppy and freaks out but instead of failing it just disconnects the floppy drive from the VM. IDK just guessing really.

Pikehead posted:

What version of vcenter/esxi are you running? 7.x is more transparent on imbalance as per https://4sysops.com/archives/vmware-vsphere-7-drs-scoring-and-configuration/

I'd say to also look at the DRS logs, but I have no idea where they would be, and knowing vmware if you could find them they'd all be an impenetrable mess of GUIDs and spam.

The clusters that are giving me the most issues are on 7.0u2. I'll have to dig up some logs next week.

# ¿ Apr 16, 2022 09:28

Pile Of Garbage: May 28, 2007

Pile Of Garbage posted:

With VMware is there a way to exempt a VM from DRS but at the VM level instead of the cluster? I know you can do it at the cluster level but is there a tag or something I can set on a VM that will make DRS ignore it?

I'm building VM templates with Packer and as part of the build it mounts a floppy to the VM containing the autounattend.xml and some scripts. The issue is that if the VM is migrated via vMotion the floppy drive gets automatically disconnected from the VM. This either causes the build to stall or only complete partially due to missing scripts. This issue seems to arise fairly frequently because DRS appears to be hyper-aggressive on clusters I'm deploying to despite being configured with default settings.

It's been a while but just to follow-up on this: looking at logs what appears to be happening is that when the VM is being migrated the target host is attempting to mount the .flp file before it has been unmounted on the source host. I assume that this is happening because it's mounting the .flp file as read/write. In comparison mounted .iso files are unaffected, again presumably because they are mounted read-only.

In the end I pretty much just said "screw it" and switched from using floppy_files to cd_files in the Packer template. I didn't use this initially as I am also using iso_paths and was unsure how it would work in combination with cd_files. Turns out it's pretty straightforward, each ISO gets its own CD-ROM drive which are attached to the VM in the following order: iso_paths ISOs in the order listed and then the cd_files ISO.

So yeah, if you're using Packer with VMware and still futzing around with floppies I'd recommend switching to just ISOs.

# ¿ May 11, 2022 15:45

Pile Of Garbage: May 28, 2007

wibble posted:

Has anyone got experiance of Oracle Linux Virtualization Manager?
Is it any good? Any issues?

Run.

# ¿ Jun 27, 2022 12:36

Pile Of Garbage: May 28, 2007

TheShazbot posted:

So, if I was on the path to specialize as a virtualization/storage engineer with VMWare, I should move away from that since the Broadcom purchase?

VMware has >75% market share for on-premises virtualisation and I doubt that even Broadcom can gently caress that up. Technology and feature-wise VMware remains the gold-standard and are the most unobjectionable choice. They've also got VMC (VMware Cloud, basically VMware-as-a-Service running on AWS) which seems somewhat popular as it makes setting up a "DR site" easier.

So yeah, there's no reason why you shouldn't still specialise in VMware. However I would recommend that you also specialise in AWS and Azure virtualisation as "lift-and-shift" VMware engagements are increasingly common.

TheShazbot posted:

Also, I've got a three-node PowerEdge R320 cluster right now that I'm wondering whether it's worth it to upgrade CPU/RAM for better performance (I'm mostly doing cluster-based homelab stuff, so three identical nodes is what I'd like to keep) or, should I save and upgrade to DDR4-based systems? I haven't been following whether or not pricing for v3/v4 machines is coming down quickly or whether or not the supply chain is still an issue.

I've done some of my own research already and I'm still on the fence whether I want to spend several hundred dollars on servers that are still using DDR3.

IMO running three physical servers for a homelab is overkill. You'd be better off with one physical box and just do nested-virtualisation.

# ¿ Jun 30, 2022 11:19

Pile Of Garbage: May 28, 2007

in a well actually posted:

There may be lots of use cases for staying on prem, but fewer that require VMware. Grudgingly maintained reliance on enterprise tech from a vendor focused on extraction-maximising subscription revenue while planning on reducing headcount in development and support is a common pattern, and not one I�d be excited about building my career around. It�s not going anywhere; I�m sure people make good money running Oracle, too.

drat this is a dogshit post. No one stays on-prem to meet use cases they do it to meet requirements. VMware is never required for any solution, rather it is chosen because you'd be laughed out of the room to choose anything else. Also I think you're confusing becoming familiar with a technology/product to further ones career with actively endorsing the technology/product.

# ¿ Jun 30, 2022 17:48

Pile Of Garbage: May 28, 2007

Subjunctive posted:

A friend of mine (he worked on Space/TimeWarp and general GPU insanity at Oculus when I was there) brought his company out of stealth today: Juice, which does IP-based transparent remoting of GPU resources at high speed.

https://www.juicelabs.co/

Binaries available for Windows, Linux (Ubuntu), and Mac; works inside VMs; no client program modifications required. I�ve seen some demo videos of it before and it is pretty friggin� nuts. Could really change the game for GPU-passthrough sorts of applications. I�m travelling right now so I haven�t installed it yet myself.

Not really sure how this would work for any non-async workloads. Might be good for transcode offload from one's Plex server?

# ¿ Nov 11, 2022 16:44

Pile Of Garbage: May 28, 2007

Subjunctive posted:

It apparently works fine for gaming, with ~150Mbit network! I�m travelling right now but I�m going to see if I can use it to virtualize my 4090 to serve a few Minecraft clients on underpowered machines.

The engineers are actively answering questions in their discord, fwiw.

Just noticed you're selling this in other threads, I don't think anything you say re the tech can be taken as neutral. Further still this sounds cooked I'd advise caution.

# ¿ Nov 11, 2022 17:44

Pile Of Garbage: May 28, 2007

SR-IOV really feels redundant when you can get servers with multiple 40Gb/s DAC interconnects and switches that will just handle that no issues. IMO getting it working well feels like diminishing returns, similar to switching storage adapters from LSI Logic SAS to Paravirtual SCSI.

# ¿ Apr 19, 2023 21:21

Pile Of Garbage: May 28, 2007

BlankSystemDaemon posted:

WSL has done wonders for Microsoft retaining people on Windows, rather than switching to something alternative.

lol no it didn't. The only thing that WSL did was make it so that I don't have to run a Linux VM somewhere/locally in order to do Linux development/other Linux things. It's a convenience at most but hardly a game changer. See also: all the devs running MacOS.

# ¿ May 17, 2023 14:47

Pile Of Garbage: May 28, 2007

wolrah posted:

WSL2 is just a Linux VM running on Hyper-V with really tight host integration.

WSL1 was more like a reverse-WINE where the Linux apps are actually running on the Windows kernel pretending to be Linux, which it did shockingly well but IIRC there were severe performance issues for certain use cases due to the different ways Linux and Windows handle disk access that weren't considered solvable with that model. It also would have required continuous development to maintain parity with the real kernel, where the WSL2 model gets its kernel updates "for free" from the upstream distros as long as Hyper-V doesn't break anything.

I don't know how it's affected others' use, but I can say that once WSL gained X support it started to impact how often my dual boot machines were on Linux, and when it got GPU support I basically stopped dual booting. I've done it twice for WiFi injection shenanigans and twice just to run updates (one of those times the nVidia driver hosed up and I got a bonus hour of troubleshooting that with my updates...)

The comparison to Mac using devs I agree with. I've used Mac laptops on and off over the years and always enjoyed being able to have both commercial software and my favorite *nix tools side by side in a reasonably well integrated manner. WSL brought the same concept to Windows.

lmao I know what WSL is. You said "WSL has done wonders for Microsoft retaining people on Windows, rather than switching to something alternative." which is silly because no one ever said "oh I can't use Linux on Windows? guess I better abandon Windows" they just spun up a Linux VM somewhere and SSH into it from their Windows machine. That's why I mentioned macOS because it's the same deal.

Perhaps you meant to say that WSL has made Windows more appealing/convenient, that would make sense.

# ¿ May 17, 2023 18:48

Pile Of Garbage: May 28, 2007

Hi all it's the worst poster here. Quick question, is the general consensus for VM CPU topology configuration still "flat-and-wide", as in single socket with whatever number of cores you require? As I understand the original reasoning was that you ideally want to keep VMs on a single physical NUMA node so as to avoid memory latency introduced from having to traverse QPI/HyperTransport and to ensure that the hypervisor CPU scheduler isn't waiting for cores to become available on more than one socket.

Is that still the general consensus or has VMware introduced some magic that makes it not matter any more? Btw this is for an environment with vSphere 7.0.3/8.0.0 hosts.

# ¿ May 26, 2023 12:01

Pile Of Garbage: May 28, 2007

Zorak of Michigan posted:

It's a trade-off. You can hot add sockets, but you can't hot change cores per socket, so if you single socket, you lose a lot of flexibility. Newer versions of vCenter also automatically construct a vNUMA topology behind the scenes that makes it a non-issue for most commonly encountered sizes. The last time I asked our TAM about it, the guidance we got was to feel free to use lots of sockets and one core per socket, up until you might be using more cores than a single physical CPU could accommodate. If your VM might not fit in a single socket, you want to specify a number of sockets that fits your hosts, and a cores-per-socket figure that also fits your hosts.

Cheers thanks for that. I did some reading and understand things better now. vNUMA seems to be the way to go however there is a caveat in that it's only enabled on VMs with eight or more vCPUs. Of course you can still enable vNUMA on smaller VMs with the numa.vcpu.min setting. Something to be aware of I guess.

# ¿ May 26, 2023 23:44

Pile Of Garbage: May 28, 2007

Relevant to the Broadcom buyout of VMware, they're already moving to transition everything to a subscription-based model and as part of that they're killing the perpetual license model for vSphere: https://www.thestack.technology/broadcom-is-killing-off-vmware-perpetual-licences-sns/.

Can't say I'm surprised, Broadcom only bought VMware to capture and exploit their revenue stream. As they've already demonstrated anything that can be considered a cost centre is on the chopping block. It's kind of similar to the corporate raiders in the 80s only instead of immediately liquidating they just bleed em dry.

# ¿ Dec 16, 2023 08:21

Pile Of Garbage: May 28, 2007

New batch of ESXi/Fusion/Workstation sandbox-escape vulnerabilities, all of them critical: https://arstechnica.com/security/2024/03/vmware-issues-patches-for-critical-sandbox-escape-vulnerabilities/. I've a feeling some poor souls are going to spend the weekend patching.

Edit: on closer inspection the vulnerabilities only affect the USB controller so the workaround is to just remove any USB controllers from all your VMs.

Pile Of Garbage fucked around with this message at 02:57 on Mar 8, 2024

# ¿ Mar 8, 2024 01:24

Pile Of Garbage: May 28, 2007

For those who are looking to escape VMware: https://forum.proxmox.com/threads/new-import-wizard-available-for-migrating-vmware-esxi-based-virtual-machines.144023/.

# ¿ Mar 29, 2024 13:16

Adbot: ADBOT LOVES YOU

# ¿ May 18, 2024 04:23

Pile Of Garbage: May 28, 2007

I've always been skeptical of HCI solutions in general as all the implementations seem to be extremely fragile. They're often reliant on one or more black-box appliances which hide a lot of complicated moving-parts that you can't easily diagnose or troubleshoot without vendor support.

Also as mentioned upgrades can take an extremely long time which is an issue not unique to Nutanix. A couple years ago I upgraded a pair of three-node Cisco HyperFlex (It's a HCI layer on top of VMware ESXi running on Cisco UCS) clusters and each one took about 16 hours to complete. Admittedly it was simple and fully orchestrated but still I had to stare at progress bars for 16 hours.

Maybe there's a niche for HCI, maybe running VDI (The HyperFlex clusters I upgraded were solely for Horizon VDI), but for workloads that you care about it seems like a big gamble.

# ¿ Apr 14, 2024 13:04

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›5 »