Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›2 »

Cidrick: Jun 10, 2001; Praise the siamese

If you're cheap like me, you can check out the non-official list of hardware supported by ESX(i) on the Whitebox HCL. It's essentially a big list of various hardware that people have shoehorned ESX onto, sometimes with the workaround required to do so.

I just bought a used HP DL160 G6 off of eBay and I've consulted it to try and figure out a cheap RAID controller to get. I'll be damned if I'm going to not use RAID-1 in this sucker.

# ¿ Feb 20, 2012 15:03

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 15:42

Cidrick: Jun 10, 2001; Praise the siamese

Sylink posted:

I know VMconverter lets you go physical to virtual, does it let you do the reverse as well?

I'm a retard/not too knowledgeable on it yet so this may be a stupid question.

Not exactly. The process is called V2P, and vmware has a few docs on the subject. In short, the process entails taking your vmdk and using it to image a system, but, as you can imagine, taking a virtualized hardware configuration and unvirtualizing it can be... tricky.

# ¿ Apr 20, 2012 16:40

Cidrick: Jun 10, 2001; Praise the siamese

evil_bunnY posted:

Actually there are (placating support engineers who blame your running virtual for poo poo that's got nothing to do with it).

Calling back 45mn later with: "oh yeah and BTW this is on bare metal" is pretty hilarious.

VMWare would make a mint by adding a paid extension that completely hides all things VMWare from dmidecode, lspci, and the like, just from admins who want to prevent braindead developers from blaming their performance issues on the hypervisor.

"No way man, you're totally running on your own dedicated 12-core 48GB box, man. It's your code."

# ¿ Apr 20, 2012 21:18

Cidrick: Jun 10, 2001; Praise the siamese

adorai posted:

The problem typically isn't virtualization, it's people with poo poo VMware environments that are terribly sized. By requiring bare metal, you can at least avoid that problem. We have many vendors that do not support running virtualized, and we just went ahead and did it anyway. We purchased 5x licenses of plate spin just in case.

True. Plus, I'm not seriously saying the developers are always the ones at fault. However, I do run into the case a lot of times where a new project comes along and it sounds like a good fit for VMWare: low I/O and CPU requirements, just memory-hungry. But the the lead developer will flee at the first sight of a VMWare environment because they've had poor experiences with it in the past. My guess is that a lot of people have tried out virtualization on their own desktop, or used it back when the hypervisor ran on top of an OS and performance was not so good. Virtualization has come a long way; a properly-architected VM environment is really drat good.

# ¿ Apr 20, 2012 23:16

Cidrick: Jun 10, 2001; Praise the siamese

Nitr0 posted:

Employer is sending me on this fast track course and I sure as gently caress wouldn't pay that ridiculous amount for a vcp but since work is paying...

5 day week drinking in the sunshine downtown.

I'm sitting in this exact class RIGHT loving NOW :hf:

My work is paying for it, of course.

# ¿ May 2, 2012 22:05

Cidrick: Jun 10, 2001; Praise the siamese

It's not a blade by chance, is it? You can reset it via the chassis if so.

If you're on a DL-series then I have no idea

# ¿ May 10, 2012 17:05

Cidrick: Jun 10, 2001; Praise the siamese

Good thing you're running vmware so you can just vmotion all the guests off to another vhost and put it into maintenance mode so you can reboot the machine and hit F8 during POST to reset it.

Right? :v:

# ¿ May 10, 2012 17:19

Cidrick: Jun 10, 2001; Praise the siamese

Moey posted:

If you are in the Chicagoland area, the vendor we recently started working with has been awesome (they do licensing as well as contracting work).

Who is that, out of curiosity? We have a vendor we get most of our licensing stuff (not just vmware) through but I'd like to know of someone a bit more vmware-centric.

# ¿ May 15, 2012 18:45

Cidrick: Jun 10, 2001; Praise the siamese

Corvettefisher posted:

I go with iscsi personally, I tend to see lower latency. NFS is a bit easier to setup, and some backup software will want NFS if it can't read VMFS.

Yeah, while NFS is pretty simple to get going I wouldn't recommend it over iSCSI. We have several large farms running via NFS off a NetApp and they just end up destroying the NetApp's CPU utilization from all the overhead. It was maxed out before we re-aligned all of our old Windows VMs and now hovers at like 70%.

# ¿ Jul 9, 2012 18:52

Cidrick: Jun 10, 2001; Praise the siamese

Moey posted:

That is why it still pisses me off the vSphere 5 has the vRAM limit. Right now we are running ESXi 4.1 licensed for Standard, but each of our dual socket hosts has 96gb memory. I will lose out on memory until we upgrade to enterprise.

Yeah. Our vhost blades have 192GB of memory with two sockets, and it's annoying to have to pay for enterprise plus licenses even though we don't use any the other fancy enterprise plus features.

# ¿ Jul 27, 2012 20:22

Cidrick: Jun 10, 2001; Praise the siamese

Has anyone played with vCenter 5.1 yet? We've been fighting with the new vsphere SSO garbage that they decided to shoehorn in to "simplify" things, and yet we can't get it to play nice with the way our windows domains and multitude of trusts are configured. At this point I'm planning to wait for the first update before I consider upgrading our prod instances to it, but I was curious if anyone else had any similar difficulties.

# ¿ Dec 7, 2012 20:11

Cidrick: Jun 10, 2001; Praise the siamese

Erwin posted:

vCenter 5.1 is a fly-covered shitheap that absolutely should not have been released, and I am ready to drink myself into a stupor after rolling it out over the past week and I wish I could take it all back, however I haven't had too many problems with SSO interacting with AD. It does seem to be random whether an individual account can log into vCenter, but I only had a problem with one service account so I used a different one.

One of the other guys on my team was the one banging his head against it so I only have second-hand knowledge, but because of the way our AD trusts are set up with the Mothership (we're a separate business unit) we control our own domain and create our own domain local security groups that contain imported users from another domain. Apparently this makes SSO cranky - or at least, we haven't been able to figure out The Right Way to do things.

Obviously we did not have this problem with vCenter 5.0 or prior. It's sort of amusing since we're running into this exact same scenario doing a POC for RHEV for a smaller VM farm.

Edit: s/heads/head/. Pretty sure the guy only has one of them unless he's really good at hiding it.

Cidrick fucked around with this message at 22:09 on Dec 7, 2012

# ¿ Dec 7, 2012 21:58

Cidrick: Jun 10, 2001; Praise the siamese

whaam posted:

So I've got 4 HP DL360 G8s with the HP esxi 5.1 iso installed and their 4 port onboard nics seem to randomly drop from 1000 full to 10 full, this is with auto turned on them all. I thought it might be related to the network runs being too close to the power so I moved them, anyone else see behaviour like with in esx 5.1? This is happening randomly across the ports, each are going to different switches as well so its not that.

What's your physical switch that you're using? In my experience, autoneg tomfoolery is fixed with an update at the switch level. We have boatloads of Gen8 HPs with no issues like you're talking about, although I haven't upgraded us to 5.1 yet (We're still in 5.0)

# ¿ Apr 2, 2013 17:46

Cidrick: Jun 10, 2001; Praise the siamese

FISHMANPET posted:

Is there a single place I can go in VMware to see total demand on the disk from a single host? (No vCenter, so per host is the least granular I can get).

Hosts and Clusters -> Select your host -> Performance Tab -> Advanced Button -> Select various options in the dropdown. "Disk", "Datastore", and "Storage Adapter" should all show some insight.

My guess, there's some massive read latency talking to your VM datastore(s). These graphs should show it happening in a nice shiny labelled format for you to present to whomever pays for your storage backend.

# ¿ Apr 2, 2013 21:05

Cidrick: Jun 10, 2001; Praise the siamese

evil_bunnY posted:

It's always easier and less time consuming, yeah.

Yeah, I still have a handful of HP blades that I built with vanilla ESXi 5.0 that I can't get to report a bunch of hardware information within vcenter for the life of me. The ones I built from the HP image work just fine.

Guess I'll wait for the 5.1 upgrade and just rebuild them from scratch

# ¿ Jun 12, 2013 21:01

Cidrick: Jun 10, 2001; Praise the siamese

Praise Buddha, one-way trusts in vSphere SSO 5.5 work out of the box!

Time to migrate from 5.0u1, finally.

# ¿ Sep 23, 2013 20:20

Cidrick: Jun 10, 2001; Praise the siamese

Yeah, don't get G6s or G7s, get Gen8s.

I moved our oldest ESXi hosts running on PowerEdge 2950s to HP BL460c Gen8 blades w/256GB of memory earlier this year and couldn't be happier with them.

# ¿ Sep 30, 2013 21:20

Cidrick: Jun 10, 2001; Praise the siamese

Martytoof posted:

Hooray, I ran into my first "everyone has their virus scan scheduled at the exact same time" issue today

I feel it's like a rite of passage I guess.

While I've never run into this, I have had "All my VMs are pulling updated packages from our internal mrepo in the same one-hour window" before.

On an NFS datastore

:cry:

# ¿ Oct 28, 2013 13:59

Cidrick: Jun 10, 2001; Praise the siamese

evil_bunnY posted:

Why does this matter?

Of all our farms, the VMs running on a Netapp filer (we have 800 or so) have, by far, the worst datastore latency. It really snowballs when you have a lot of VMs performing i/o at once.

# ¿ Oct 28, 2013 15:51

Cidrick: Jun 10, 2001; Praise the siamese

Does anyone have any experience exporting or querying vhost performance data out of vcenter to some third party database? We're interested in getting host performance metrics - host CPU utilization, datastore read/write latency, all that jazz - out of vcenter and into something like graphite which is what the rest of our environment uses.

I've found some documentation about hitting the web services API to pull performance data, and I also stumbled across a post where they just query the MSSQL database directly. In terms of simplicity it seems like querying the DB is probably the way to go, but I was wondering if anyone else had tackled this already.

# ¿ Dec 12, 2013 21:40

Cidrick: Jun 10, 2001; Praise the siamese

Misogynist posted:

Host performance data is really easy to get with PowerCLI (you can run a scheduled task right on your vCenter server) and not incredibly difficult over their other APIs. Do note that, if you want to collect performance data on your VMs too, PowerCLI is really incredibly slow because it's totally synchronous, and you should look into coding against one of the async APIs instead (or seriously narrow down which information you want to grab to minimize your response sizes).

I don't really care about performance data from the VMs - our monitoring agent is running on the guest OSes already, which provides nearly all of what we would care about. PowerCLI is one approach, but our environment is almost completely Linux-based so we'd probably have to go with one of the other supported methods via the API so we don't have to use PowerCLI on a Windows box and then export it again to our Graphite hosts via some other method. Thanks for the feedback on avoiding the DB, I sort of figured bad-touching the DB directly was not a very good idea but I just thought I'd throw it out there.

Misogynist posted:

For better or worse, Graphite will collapse under the stress of all that data anyway unless you're running it on half a dozen SSDs, by the way, but maybe some of the new Cassandra-based Graphite datastores will work better.

We have very serious monitoring infrastructure supporting Graphite and our various other tools. A large part of our metric collection is done on servers equipped with FusionIO cards to be able to keep up with all the metrics we collect from all our different environments.

Our monitoring team lead is actually the brother of the guy who wrote Graphite originally

# ¿ Dec 13, 2013 15:51

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

The perl CLI is actually pretty good. Or you can use pysphere. Or just look at pysphere's code. Or use the Web Services SDK example Java programs if you're a java shop. The .NET applications would probably run under mono (never tested).

Pysphere! I had totally forgotten about that. We have some scripts for automated VM deployment that hit the API that someone wrote and then quickly forgot about. Most of our monitoring guys are python programmers, so that's should work out well. Thanks!

evol262 posted:

The fact that you need to collect metrics on a FusionIO backend says everything.

No arguments here.

# ¿ Dec 13, 2013 16:38

Cidrick: Jun 10, 2001; Praise the siamese

Erwin posted:

What was the problem VMware SSO was supposed to solve again? Stuff working correctly too often?

Three rebuilds of one of my vcenter servers later and all I have to add to this is fffffffff

# ¿ Jan 8, 2014 21:20

Cidrick: Jun 10, 2001; Praise the siamese

So vcenter 5.5.0b seems to have finally fixed my authentication problems, but now I have another one - since I was troubleshooting two separate instances of vcenter, I was leaving SSO configured in standalone mode in both sites to reduce the amount of variables. However, now that I finally have two functional instances I was going to configure them in linked mode. Of course, since SSO is configured in single-site mode on both sides, linked mode won't work.

Is anyone aware of any way to reconfigure SSO post-install to use a different deployment model? Or am I stuck rebuilding vcenter/sso/inventory service/web client from scratch (again) in one of my sites? Trolling VMware's support site and forums weren't much help.

# ¿ Jan 16, 2014 16:22

Cidrick: Jun 10, 2001; Praise the siamese

Dilbert As gently caress posted:

Did you set the second SSO server to join an existing?

Nope. I know what I did wrong - they're both set as standalone instances - but I'm wondering if there's any way to change that after it's already installed, or if I'm stuck uninstalling just so I can re-install SSO in a multi-site deployment.

# ¿ Jan 16, 2014 23:04

Cidrick: Jun 10, 2001; Praise the siamese

wibble posted:

Can anyone help with an issue with a 5.0 host? A blade dropped out of vsphere a few days ago.

That's funny - I had the exact same issue with a 5.0 host, the exact same support experience, and the exact same resolution. I never was able to get those VMs off to other hosts, so I just had to take a late night outage to reboot that host. Real goddamn annoying.

If you figure out anything clever I would love to know about it.

# ¿ Feb 6, 2014 23:34

Cidrick: Jun 10, 2001; Praise the siamese

thebigcow posted:

I thought datastores larger than 2TB was a feature of 5.5

Datastores have been able to be much larger than that for quite some time, but individual extents prior to VMFS5 could not be larger than 2TB.

# ¿ Feb 14, 2014 19:29

Cidrick: Jun 10, 2001; Praise the siamese

Virigoth posted:

Is there a hardware setup/automation thread anywhere for setting up chassis and blades? I've been trying to find information on automating(through anything besides what they sell) a deployment for an HP C7000 Chassis with 16 blades. I'd like to run it from nothing all the way up to having the OS installed and ready for my OS scripts to run.

I do this by pulling MACs of the individual blades on the chassis and put them into DHCP, then once I turn them on I just hop on the console to PXE boot them into an ESXi kick. You can feed ESXi a ks.cfg-esque configuration so it'll boot and configure itself on its own, then all you have to do is add the host into vcenter and apply a host profile. Repeat 15 times.

Dilbert As gently caress posted:

The only cool automation stuff with blades I'm familiar with is UCS.

HP OneView can do this as well - we're thinking about getting it, because we hated UCS and have a massive HP footprint already.

# ¿ Feb 20, 2014 21:12

Cidrick: Jun 10, 2001; Praise the siamese

Virigoth posted:

It's a HP/Hyper-V shop. We've just been pulled over from testing to help bail them out. I've literally spent 4 days max on this project, but scripting the OA and ILO looked like the best bet, but I don't have enough time to sit down and learn it all. I was hoping I could do it from Powershell or some sort of SSH magic without adding anything else in.

With the C7000s, you can log in via SSH and run "show config" to get a 100% scriptable config that you can then modify and duplicate to another chassis. It includes everything, even non-configured garbage like EPIBA, so it's probably best to sanity check it before copying over all that extra bloat.

What do you need to configure on the iLOs on the blades themselves? I don't think we modify anything other than the login credentials on a fresh blade, so I'm just curious.

Edit: I just remembered, G7 blades and newer have a feature called intelligent provisioning that you can use to automate a lot of this. I've rarely used it - mostly just for SmartStart-esque diagnostics - but I believe it can do the kind of thing you're talking about where you can feed it an OS configuration and a link to the media and it can set up a customized blade for you. It might be worth looking into for your purposes.

Cidrick fucked around with this message at 21:51 on Feb 20, 2014

# ¿ Feb 20, 2014 21:48

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

There also appears to be some kind of cmdlet support that I've never seen or used, but it exists, and may be useful.

That's pretty shiny. I know in the past we've used hponcfg on the RedHat side to do things like change admin passwords, add ssh keys, or reset network configurations for iLO while the machine was live. Of course, if you can't get to iLO in the first place to put an OS on the blade, then that's not terribly helpful

Also, I seem to recall that you can at the very least set up user credentials on all the blades in a chassis from the OA, which could suffice in the short term. Let me go test that or see if I'm making stuff up.

# ¿ Feb 20, 2014 21:55

Cidrick: Jun 10, 2001; Praise the siamese

Virigoth posted:

Configuring array and logical drive

This is probably out of the short-term question, but: who do you buy your blades from? A lot of vendors can configure this at the warehouse for you, so you don't have to do this on every single blade you get. All our blades are a simple RAID-1 setup so we get them from the factory in the right configuration right off the bat.

How are you configuring your array and LDs? Is it something you can modify after the OS has been put on?

Virigoth posted:

Mounting OS ISO
Booting to OS, formatting the drive, and then installing
Setting up the adapters to match the interconnects and teaming them (scripted)

All of these can be taken care of with a proper OS deployment product. If you have a Windows environment then I would really really urge you to look into Windows Deployment Services - it's free and built into Windows 2008 R2 Standard (I haven't played with 2012 yet but I can't imagine that they got rid of it). Assuming you can boot from network and have your DHCP daemon point its next-server to your WDS server, you can create an auto-answer file that will do all of this mess for you, including any post-build stuff like configuring NIC teaming or joining Active Directory for you.

# ¿ Feb 20, 2014 22:09

Cidrick: Jun 10, 2001; Praise the siamese

Mausi posted:

Particularly on Esxi5.x, but anyway, after a vMotion I'm pretty sure that the VM is triggered to do a time update.
Now, VMware Tools is configured to not use Host Sync, as we have a perfectly good NTP setup. I *thought* that it would use some feature of VMware Tools to tell the guest OS to do an NTP update, but I am worried that it's doing a Host Sync anyway?
I'm aware of the KB article that forces it to not issue the time update, what I'm specifically curious about is: Is the default post-vMotion timesync a host-sync, or does it trigger the Guest OS to update via NTP?

Stupid question: why not leave your VMs to sync time with the host, and then just configure the hosts to use NTP?

# ¿ Apr 7, 2014 14:46

Cidrick: Jun 10, 2001; Praise the siamese

Where's the best place to start troubleshooting high DAVG latency against a datastore? We have identical hosts with identical FC connectivity to a Hitachi-based SAN, one living on Hitachi VSP and another on Hitachi HUS, and while we get fantastic <1ms guest disk service times against VSP, we get 5-10ms guest service times that spikes into 100+ms latency under very little IO load. esxtop shows the KAVG as nearly 0 and it's the DAVG that spikes into the hundred-millisecond range at random, which VMware says is on the driver side and on down, potentially being a misconfig with your HBAs or at the SAN layer.

Is it worth trying to tune on the VMware side (LUN queue depth, etc) first, or do I need to blackmail my storage guy to open a case with Hitachi? I don't have visibility into the SAN side of things, so I want to have my ducks in a row before I try to blame everything on Hitachi.

Cidrick fucked around with this message at 23:57 on Apr 22, 2014

# ¿ Apr 22, 2014 23:52

Cidrick: Jun 10, 2001; Praise the siamese

TeMpLaR posted:

There is a large amount of memory overhead doing this, that is the main reason I had.

How large is large? We also enable hotplug by default and this is the first time I've heard this.

# ¿ May 7, 2014 01:54

Cidrick: Jun 10, 2001; Praise the siamese

Mausi posted:

Bahahahahahahahaha.
Actually helpful response: Anything that removes the requirement to have downtime/repetitive effort is the whole point of VMware - vMotion is what sold the product in the beginning and that was all about avoiding planned downtime.

Yup, this. I know it depends on the environment you work in, but requirements change. A lot, as it turns out, in our shop. A group of VMs may end up getting more traffic for one reason or another, and we like to be able to tell those teams that we can upgrade their VMs on the fly if such a scenario happens.

# ¿ May 8, 2014 12:11

Cidrick: Jun 10, 2001; Praise the siamese

NippleFloss posted:

Some storage vendors support tools that run within the guest to reclaim space by looking at the block allocation table within the guest and zeroing out unused blocks then hole punching.

You can do this from the host as well, if you have the right VAAI extensions

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2014849

I've done it, once. It was pretty painless actually. Well, the actual SCSI UNMAP was painless, anyway. The part that sucked was when all my VMs on said datastore paused, because the array ran out of space even though the datastore still had over a terabyte of free space. Whoops!

# ¿ Jul 21, 2014 21:53

Cidrick: Jun 10, 2001; Praise the siamese

NippleFloss posted:

This will reclaim blocks that VMFS has freed on a LUN but that have not been returned to the storage pool because the array does not know the blocks are no longer in use. But it still won't free up space released by the guest because VMFS doesn't know that those blocks are free (unless a guest aware tool tells it).

Ah, you're right, I mis-understood what the original requester was asking for.

# ¿ Jul 21, 2014 23:11

Cidrick: Jun 10, 2001; Praise the siamese

Do any of you guys use Nimble Storage? We're looking to move off of using Hitachi SAN and getting a dedicated storage array solely for VMs, and Nimble seems like a pretty attractive option, but I'm wondering if there's any horror stories out there since it's still a relatively young technology.

# ¿ Aug 29, 2014 13:42

Cidrick: Jun 10, 2001; Praise the siamese

jre posted:

Anyone have any ideas how to fix this ?

What's in the IML when you log into iLO?

# ¿ Sep 19, 2014 15:08

Adbot: ADBOT LOVES YOU

# ¿ Apr 27, 2024 15:42

Cidrick: Jun 10, 2001; Praise the siamese

mattisacomputer posted:

Across all datastores we're seeing spikes of up to over 500ms read/write latency on the active storage paths. So far we've checked/tried the PSP (now configured to Round Robin), the problem started with Jumbo Frames enabled, they're now disabled for troubleshooting at Dell's request, the MD3220i is running the latest firmware. We've swapped out the ProCurve 2510s for 3800s, updated the firmware on both, updated the firmware on the Dell hosts, etc. Each path has it's own VLAN on the switches. Everything's been rebooted and we can't find any correlation between specific VMs / datastores / paths.

Try to catch it while running esxtop (and press D for disk mode) on one of your ESX hosts and find out which LUNs and/or devices are generating the high DAVG/KAVG during periods of high latency. If the latency shows up as DAVG (which is what it most likely is) you can prooobably rule out VMware as a culprit, and then you'd have to look at the individual pieces of your physical path from there. Since this is iscsi, you can also try setting up a port mirror on your switches for one of the problematic hosts and set up a tcpdump/wireshark on the mirrored port to watch the conversation and find out where in a connection the delay is coming from. Wireshark supports a native iscsi filter so it should be pretty easy to get only the traffic that you care about.

# ¿ Oct 12, 2014 18:00

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›2 »