Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›13 »

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Spring Heeled Jack posted:

My managers concern with getting another SAN is the upside-down pyramid we get with all of our data sitting on this one appliance, no matter how much redundancy is built into the unit itself. It's been a while since I've looked at the current technologies but it seems there are some solutions out there to alleviate this without breaking the bank.

It shouldn't, all that data should also be sitting on whatever you're using for backups

# ¿ May 3, 2018 14:26

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 12:54

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Spring Heeled Jack posted:

Oh you mean our tapes that live at Iron Mountain?

At some point you should probably address your awesome RTO

# ¿ May 3, 2018 21:02

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I've done way dumber things in production than Scott Alan Miller out of necessity and I can't wait until we're not doing them anymore so I can tell you about them

# ¿ May 4, 2018 13:56

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Methanar posted:

Inside of a container, do you still need to run public facing services like haproxy or nginx as unprivileged users, or is it fine to run them as a normal ubuntu user? Chroot directives are extraneous as well right?

chroot is extraneous, yes. Running as an unprivileged user is also a lot less dangerous than outside of a container, because they need a way to break out of the container context in order to do most things you would actually care about. In order to further harden the container, you have a few different options:

Run as an unprivileged user inside the container
Use your container runtime's support for capability whitelisting
Lock down your application using a MAC framework like AppArmor or SELinux

Depending on what you're doing, these may be totally unnecessary; unless you explicitly run with --privileged, Docker will create a capability bounding set that does not include the capabilities to do most dangerous operations as root. I think this list of capabilities dropped by Docker is probably out of date, but it's the best I was able to quickly find.

# ¿ Jul 3, 2018 11:11

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evol262 posted:

I don't know if Ubuntu (well AppArmor) has support for this or not, but if you have selinux, you should also have policies for the container itself.

CoreOS does this automatically (svirt and selinux), as does RHEL/Fedora, through container-selinux

Docker has a default AppArmor policy that's applied to new containers, but it's fairly minimal.

# ¿ Jul 3, 2018 12:22

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

My first thought had something to do with a mismatch on the different ends of an interface bonding configuration

# ¿ Aug 12, 2018 15:25

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

The lines between type 1 and type 2 hypervisors have been increasingly blurred over the years, to the point where it's almost an irrelevant distinction in modern times (or, at least, not a granular enough one to be useful).

# ¿ Aug 14, 2018 14:47

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

ESXi doesn't read or write the media once it's running, so if this is for a colo use case instead of something you're turning on and off five times a day, does it actually matter?

# ¿ Aug 21, 2018 05:37

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Database containers are fine, but anyone running them needs to understand the runtime requirements and maybe not run with as much dynamism as you might run a stateless application instance, because there's a lot more at stake if something goes wrong. Automatic failover and self-healing are great, but a lot of us olds are still wearing the war wounds of a MySQL instance that flapped too fast in a cluster capacity and trashed the whole DB volume.

# ¿ Aug 23, 2018 16:39

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

BangersInMyKnickers posted:

Enabling hot-add (which disabled vNUMA exposure) only matters for VMs that will span beyond a node right? Our infrastructure people are turning it off for 2 vCPU guests on hardware with 12cores on a socket (2 nodes with 6c each for cluster on die topology too I guess but that performance penalty is small) and I'm pretty sure it's a pointless and lovely thing to do on everything but the largest VMs we run.

Probably, yeah. I thought I remembered something about CPU hot-add causing a slight increase in baseline CPU utilization on Windows platforms a few years ago, but I don't know if this is still the case. I might have been making it up all along.

vNUMA isn't enabled on a VM until it has 9 or more vCPUs assigned to it at the time the virtual machine is powered on (this number is higher if you have more than 8 cores per socket; for 12-core nodes, you would need 13 vCPUs). So for 2 vCPU VMs, it literally makes no difference�there's no vNUMA anyway.

Vulture Culture fucked around with this message at 04:37 on Oct 8, 2018

# ¿ Oct 8, 2018 04:32

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

There�s 2 ways to go about this: either you relinquish perf to the devs and never get a say in it ever again, or work with them testing both configs back to back.

or ignore it entirely and just let memory ballooning do its job

who has time to give a poo poo about 2 GB of RAM in nearly the year 2019 anyway

# ¿ Dec 1, 2018 17:04

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

bull3964 posted:

At least, this is the impression I have from discussions with the team involved.

the team involved is stealing every computer

# ¿ Jan 31, 2019 22:15

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

BangersInMyKnickers posted:

it was hot garbage for me even before the dell acquisition. thanks for the poo poo lacp implementation guys

Ah, VMware's old Achilles heel, "any kind of standard networking function". As long as VMware has been a thing, standard vSwitches have flat-out ignored packets with QoS flags. Not ignored the flags�ignored the packets. Get Methanar drunk and ask him about Arista switches

# ¿ Feb 6, 2019 18:21

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Agrikk posted:

Can anyone suggest some eBay-able hardware that is compatible with ESX 6.7?

I have an Opteron-based ESX 6.5 lab whose CPUs are not supported anymore so I cannot upgrade to 6.7. I have three hosts built on Supermicro H8SGL-F motherboards with Operon 6000-series CPUs and they have reached EOL on vMWare's HCL.

I'd like to replace the three hosts with two hosts, spending around $250 per box for CPU/RAM/Motherboar. Can anyone recommend me a CPU/Motherboard combo that is (relatively) future proof?

I'd prefer it to be a Supermicro board for the IPMI capability with remote KVM, but will look at anything.

Not sure if this is helpful at all for Opteron 61xx series, but some people have had luck bypassing the compatibility check on Xeon 56xx:

https://www.thehumblelab.com/vsphere-67-homelabs-unsupported-cpu/

# ¿ Feb 25, 2019 22:08

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

In 2019, VMware still can't figure out how to set a process's configuration at runtime

# ¿ Mar 15, 2019 17:57

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Agrikk posted:

Is there a way to migrate a VM from one cluster to another cluster that has incompatible CPU types?

I have an ESX6.5 cluster running on Opteron 6128 CPUs and a cluster running ESX6.7 running on Xeon E5620 chips.

I'm trying to decommission the Opteron cluster but doing a vMotion won't work due to the incompatibility between Opterons on 6.5 and Xeons on 6.7.

What's the best way of moving these workloads to the Xeon cluster?

Power off, migrate, power on. There's no way to do it live

If these are Windows VMs you'll probably need to answer a question on the VM console and reboot again afterwards

# ¿ Mar 19, 2019 18:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

I once had an Oracle rep threaten to sue me because I referred to "the user" of a specific Solaris server while discussing a support extension and she started screaming and accusing me of illegally renting Oracle's technology to other people

# ¿ Mar 22, 2019 15:35

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Martytoof posted:

Is there anything you guys can think of which can hit all of the following:

- provide pci passthrough
- run a windows nt guest
- type 2 hypervisor preferred, so rdp or vnc are not required to interact.

I have some fairly esoteric software that only runs on nt, requires specific pci cards to operate some lab equipment, and is interactive so I�d prefer not to muddy up with remote connections. Getting tired of the hardware this runs on failing so I�m hoping to virtualize it on commodity new hardware.

You're not going to find "commodity new hardware" with PCI slots. I wouldn't expect the boutique add-in cards shipped with high-end microscopy setups to work with an IOMMU, in the first place, but a PCIe->PCI adapter is only going to complicate things. Your best option might be to try an old Yorkfield Core 2-era setup with both VT-d support and a hardware PCI slot, but stuff might be funky with the IOMMU groups on those motherboards.

Honestly, you're probably going to have an easier time just building a new box of old poo poo.

# ¿ Apr 5, 2019 03:28

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

What's really cool about discussing NFS locking behavior is that locking isn't part of NFSv2/v3 at all, servers don't care whether or not you lock any files whatsoever, and it's implemented through a separate protocol (NLM) that is respected by some but not all NFS clients.

# ¿ Jun 19, 2019 19:12

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

evil_bunnY posted:

Lmao I�m so glad I never have to worry about this.

Storage is the printers of the datacenter

# ¿ Jun 19, 2019 21:34

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

YOLOsubmarine posted:

This made for some real fun if you ever tried to do multiprotocol access on a NetApp volume since CIFS has mandatory locking and NFS has advisory locking and getting those to play nicely on the same file is basically impossible.

Mostly it works fine though since any application that cares about locking and supports NFS just handles it out of band through lock files or some other mechanism, since even NLM support isn't always a given.

Ask me about ingesting 1 million points of time series data per second to figure out that this dumb unannounced loving IBM change to SMB opportunistic locking on GPFS was to blame for all of my HPC NFS client performance taking a complete poo poo

# ¿ Jun 19, 2019 21:51

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

MrMoo posted:

Might be an odd question, does VMware have an equivalent to KVM in terms of performance?

I just see a KVM host in the middle of the Internets running a lot faster than VMware clustering with dynamic CPU scaling at a big company (tm). I see that a lot of Amazon AWS has migrated from Xen to KVM, matching Linode.com from way back in 2015 or so.

KVM is a high-performance host virtualization interface for hypervisors to interact with, but KVM core isn't a full hypervisor itself—you still need something like qemu emulating hardware. That makes this question a bit difficult to answer.

AWS runs using custom hardware to provide a lot of the functions, and they've written a lightweight hypervisor called Nitro to make optimal use of it. Brendan Gregg did a great writeup:

http://www.brendangregg.com/blog/2017-11-29/aws-ec2-virtualization-2017.html

Google also has their own hypervisor based on KVM that they use to run Google Cloud.

# ¿ Aug 19, 2019 03:17

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

YOLOsubmarine posted:

You don't need much to play with docker and k8s. They are extremely lightweight. Don't spend a bunch of money on something with a lot of cores and memory and ssds that you likely won't use unless you've got some relatively large project you want to test.

I'll also suggest just setting up and AWS or Azure account and doing it with their IaaS services. You can also test their container platforms while your at it.

Yeah, ingress networking on bare metal is still an extremely unsolved problem and I'd deal with it as a matter of last resort

# ¿ Sep 12, 2019 12:24

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 12:54

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

bull3964 posted:

Anyone see the balloon driver kill networking on a VM when it activates?

I've been chasing down an issue where Availability Group members suffer a node eviction from the cluster for a minute or two and today I finally managed to align balloon memory load to the exact moment that it happens

The VM doesn't hang as far as I can tell as it's still logging events in the event log, it just loses connectivity to the other node members until the balloon driver finishes its thing, then the node has no trouble talking to the other nodes and the cluster becomes healthy again.

Most of the time this has hit our secondary AG members so we don't suffer a failover. It makes sense really, the secondary nodes don't have a ton of active memory since they aren't seeing app load so they have the memory to give up when the host needs it, but I've never see this disrupt networking like that before. This has happened on multiple VMs, both 2012 R2 and 2016.

It doesn't need to disrupt connectivity, it just needs to pause the VM for long enough that it misses heartbeats. MS clustering is pretty aggressive with its heartbeat intervals, and it used to be common for things like vMotion to knock them out of the cluster. I guess it's possible that a VM pause for balloon reclamation, on a VM with a large amount of memory, could have the same effect.

Real talk: I'm not sure what you're doing where the application is large and important enough to need clustering, but unimportant enough where you can play games with the performance by randomly evicting cache. Unless you're seriously budget-constrained, it probably makes sense to disable ballooning on these VMs.

# ¿ Oct 12, 2019 17:43

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›13 »