Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›9 »

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

skipdogg posted:

Our cluster is having major issues, been on the phone with VMWare support most of today, another guy is taking a crack at it right now. 2 hosts just show up disconnected from vsphere. The VM's are still running and accessible but VMWare support is telling us to manually shut down all VM's and hard reboot the hosts. It's a last resort option right now, there's 3 SQL servers on there among other servers.

One host isn't even responsive on the console... bleh.

My guess is they lost some storage. I've been through exactly this, it sucked. The worst part is that in our case the HA agent tried to restart the guests, but since they were already running it failed, and we had duplicate VMs all over the place.

# ¿ Feb 16, 2013 04:18

Adbot: ADBOT LOVES YOU

# ¿ May 20, 2024 10:02

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Corvettefisher posted:

Can't say I have ever seen that, but it sounds plausible. It wouldn't be the strangest thing I have heard of this week....

We improperly removed some iSCSI datastores that had no VMs on them, and it caused exactly this behaviour.

# ¿ Feb 16, 2013 04:50

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Above Our Own posted:

Noob question here: if I'm running a computer inside of Virtual Workstation, does that VM expose ANY information about the host machine?

it shouldn't, but it can potentially be done via certain attack vectors. Here is a link that details how they proof of concepted stealing info from another VM.

http://arstechnica.com/security/2012/11/crypto-keys-stolen-from-virtual-machine/

# ¿ Mar 6, 2013 01:16

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

whaam posted:

Hearing mixed things on this but is it possible to attain 4Gb bandwidth between shared storage and a host using 4 1gb nics and LACP on stackable switches? My vendor says yes, some forums say no.

It is, but not in a single stream. With round robin MPIO and multiple iscsi VMKs you can get ~4gbps, though obviously there will be some loss. If using NFS you will not get above 1gbps.

# ¿ Mar 14, 2013 00:50

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

whaam posted:

We are using NFS, looks like we got some bad info from our vendor. Thats a raw deal because I doubt we will get the IO we need on 1Gbit.

Do you need more than 1gbps on a single datastore? You can always assign multiple IPs to your storage, that way 4 different VMs could theoretically each get 1gbps on NFS.

# ¿ Mar 14, 2013 01:13

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

whaam posted:

We have an IO heavy SQL application we are installing that according to the software company needs more than that kind of speed.

Can you use iSCSI inside the guest? That's how we get around this issue.

# ¿ Mar 14, 2013 01:28

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

ragzilla posted:

You'd still get limited by the src/dst load balancing schemes unless the storage had multiple IPs and you had multiple LUNs.

Well, yes. In the described scenario, you can put 4 interfaces on the storage backend and get 4gbps to your backend storage with iSCSI and MPIO. You cannot say the same with NFS backed storage, no matter what you do you will still only get 1gbps from that guest OS to it's database.

edit: I suppose you could create multiple VMDKs on multiple datastores which are mapped with different IPs, and raid them in the guest to get >1gbps via NFS, but that seems a little over the top.

# ¿ Mar 14, 2013 04:04

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

ragzilla posted:

Moral of this story, just run 10GbE.

No doubt. Of course in 2 years we'll be snickering at the poor bastards who don't have 40gbe.

# ¿ Mar 14, 2013 05:42

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

whaam posted:

The software vendor had us run the sqlio test and claimed we needed 7,000+ iops result from that, which is what they got in their controlled environment on RAID10, they didn't want us running RAID-DP at all initially because they claimed it was inferior. We ran the SQLIO test in a development environment and ended up breaking that IO requirement but only when we allocated more than 4Gb of bandwidth to the NFS storage (it was tested on 10GbE). I'm 100% sure their requirements are bullshit but if we don't play by their rules they will claim our storage is the problem the first time we run into an issue.

IOPS and link speed are not intrinsically linked. The bandwidth of 7000 IOPS of 4KB mixed random reads and writes is a lot different than 7000 1MB writes. The first one needs a maximimum of 218mb/s (plus overhead, either way well under 1gbps), probably less since it's a mix of reads of writes. The second needs 55Gbps. I'm not sure you should rely solely on your lab test.

# ¿ Mar 14, 2013 23:11

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

whaam posted:

I see what you mean, we had a specific size of write for the test, it was just to show that on a similar size write that RAID DP could pull the same IOPS and MB/s as RAID10 that they benchmarked with, the size of the test had no practical link to the size of the writes that the actual program will do.

You mentioned iSCSI earlier, we have been researching as well as our vendor if its possible to band 4 1GbE links together in iSCSI and have traffic actually travel over more than one line at a time. Even with iSCSI is this a case of doing unconventional things like VMDK software raid and using different subnets for each link? Or does iSCSI support a more straight forward method?

iSCSI supports MPIO which allows you to more or less round robin your disk operations. You can do it from either your guest or from the VMware host, and which way your choose is probably more of a personal preference than anything else. We run an iSCSI initiator in our guests and use NFS for our VMDKs, but we used to run MPIO iSCSI for our VMDKs as well.

# ¿ Mar 15, 2013 22:36

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Walked posted:

Right now, this is a pretty vague question, but: What SAN solutions [ideally under 50k] have you had good experiences with in Hyper-V environments? Just trying to see what has worked for others, and worked well.

I just wanted to put in my $0.02. Our virtualization environment consists of:

6x High end single proc Cisco UCS rack mount servers each with 128GB of RAM: $60k
6x Procs of enterprise plus VMware licensing (not sure on exact price, I would guess around $20k including vcenter)
6x procs of Server '08 datacenter: $20k
2x Nexus 5k switches (layer 2 only): $35k
1x NetApp HA pair: $200k

I put this information here so you can see that in our case, the storage is what we spent the most on, not the least. Nearly every performance issue we have had in our environment has been traced back to the storage in one way or another. It is never the network, and it is definitely not ever the servers. This is probably a consequence of the great visibility we have into CPU, RAM, and network utilization, and the OK at best visibility we have into our storage, but either way, storage is the most important thing in any new deployment, in my opinion.

# ¿ Mar 16, 2013 23:33

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

evil_bunnY posted:

Hold me or I may die laughing.

I went back and looked, and they were only actually $52k (total). Either way, not sure why you are laughing. Even using newegg prices for memory, 10Gbe NICs, and e5 2690 procs, I don't think we could have done much better.

# ¿ Mar 17, 2013 00:32

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Do it from active directory restore mode

# ¿ Mar 26, 2013 23:02

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

List price might be more expensive, they are reasonably competitive with normal discounts.

# ¿ Apr 4, 2013 03:00

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

mAlfunkti0n posted:

Currently they are seen as snapshots since the LUN id's no longer match the signature on disk. Ughhh.

I do this all the time and have literally never had that problem.

# ¿ Apr 9, 2013 03:45

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Corvettefisher posted:

Just a question what do you all use for Backing up?

I'm a fan of PHDvirtual myself, I know Veeam is pretty frowned upon which my new place uses but has many gripes about.

Also the new Symantec A/V for VDI is actually pretty loving good.

Snapshot SAN volume -> replicate to offsite.

# ¿ May 1, 2013 04:12

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Corvettefisher posted:

How does a failed upgrade result in "a couple of days of downtime"... I mean seriously, HOW does that happen?

some people are dumb and just click next, without any planning (or backups).

# ¿ May 18, 2013 04:55

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Corvettefisher posted:

Yeah don't give VM's more than they need if you can help it. You would be surprised how many servers run happy on 1-2 vCPU's.

at least 90% of our environment is single core, even the servers where the vendors demanded 4 cores.

# ¿ May 21, 2013 02:39

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

FISHMANPET posted:

I'm thinking that more than anything it's the latency that's killing me, right. This stuff all lives in the same rack connected by a single cheap unmanaged switch, I should be hoping for something around 10ms, right?

end users will typically begin to notice slowness once it exceeds 25ms in many applications. they will start screaming between 50 and 100ms.

# ¿ May 23, 2013 06:11

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

warning posted:

Troubleshooting an issue where hostd is constantly crashing and not allowing you to connect via management software kind of forced my hand. Its escalated to engineering with vmware support at this point, so it seems like it was indeed over my head after all and I don't need to feel so bad.

I will admit that it's the lazy way out, but with host profiles and dvswitches you should be able to format and reinstall the host in about an hour. Much easier and less aggravation than engaging support.

# ¿ May 31, 2013 05:38

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

or just use a vlan.

# ¿ Jun 1, 2013 02:27

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

anyone know how to solve the issue of getting prompted to reboot every time you fire up a virtual desktop with paravirtual drivers? I am not sure if it is the disk controller driver or the network, but as soon as we switch to SAS and e1k, the users stop getting prompted to reboot immediately upon login.

# ¿ Jun 1, 2013 21:15

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Corvettefisher posted:

Also any reason you are using the Paravirtual on VD's?

Lowest CPU utilization, which in reality, we are memory bound at this point so it doesn't really matter.

While I did not build the image myself, I am quite certain it was fully installed before hand.

# ¿ Jun 1, 2013 23:08

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

evil_bunnY posted:

If you still have to install them, they weren't there before.

They don't have to be installed, they just prompt for a reboot. No matter how many times we reboot the parent image, the clones still prompt. I was assuming it was a hardware identifier issue of some kind, whereby it sees the NIC of the new clone as different hardware or something.

# ¿ Jun 2, 2013 14:32

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Erwin posted:

Are you sure that's what's causing the reboot prompt? Changing EVC mode prompts for a reboot of the VM (at least it did in 4.1, I haven't changed it since upgrading to 5 and 5.1). Maybe you developed the master in a cluster with a different EVC mode than where you're deploying to?

well, changing from paravirtual to e1000/sas solves the issue, so I don't think it's EVC.

warning posted:

Do you have this hotfix installed?

The Win7 VMs will detect VMXNET devices as new hardware if you do not.

This is all over vmware documentation so most people have it, from what I've seen.

http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1020078

This looks like a winner. We are using Xendesktop on ESXi, so that might be why we missed this.

# ¿ Jun 2, 2013 22:06

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Misogynist posted:

Have you looked at Logstash? It's not quite Splunk, but it's very free.

Thanks rear end in a top hat, now that I know this exists I have to implement it.

# ¿ Jun 15, 2013 15:05

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

three posted:

I literally can't think of a single thing View can do that XenDesktop cannot.

in side by side comparisons, view is better at streaming video. Typically, not something that you really need to leverage in a VDI deployment to it's fullest extent, but still it is one thing that view does better. We compared view and xd before deploying, and xd was better. We tried it on xenserver first, but the lack of vshield pushed us back to esxi.

# ¿ Jun 19, 2013 05:27

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

goobernoodles posted:

I'll preface this by saying I kind of lucked into my job and I feel like I probably don't know a lot of things a person in my position should know.

I've been looking into doing a infrastructure refresh for a months now, and after the CFO balked at the initial 130k price tag of replacing servers, switches, SANs and backup appliances, I'm hoping to just implement better backups initially, then later this year or next, move forward with the rest. Right now I'm looking into implementing a couple Data Domain DD620's to replace a few lovely BDR's that are backing up our (VMware) servers at the file system level. Getting pressure from EMC saying the price is going to go up 10k if I don't buy in the next two days. I'm want to implement site-to-site replication of some sort however I'm trying to figure out what is the most logical and practical approach to backing up two offices.

However, since I want to replace our SAN in Seattle and potentially implement one in Portland, what is the practical difference between replicating backups and SAN to SAN replication? Am I going about it all wrong with the Data Domains?

With the assumption I go with the DD620's, does anyone have a strong opinion on which backup software to use? Veeam, vRanger? (Zerto? - looks more like SAN replication?)

Jesus Christ I need to organize my thoughts better.

personally I think buying a pair of dd devices when your real goal seems to be new sans with replication is a bad idea. Just get a pair of new sans now if you can afford to.

Please give us an idea of your current infrastructure so we can more effectively advise you.

# ¿ Jun 28, 2013 00:58

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

1) Get 2x HA pairs of Oracle Sun 7320 storage.
2) Get 4x (2x for each location) Cisco 3750x gigabit switches
3) License everything you need

For #1, I went all out with ours and got roughly 15TB (usable) on each with tons of cache for $100k for both pairs. You could get less storage and less cache and probably end up somewhere around $80k
For #2, I think you can probably do all four switches for well under $10k. They stack and allow you to do cross switch etherchannel links, giving you good enough speed and redundancy for your organizations needs.
For #3, I can't exactly comment.

I think you could do the entire project for under $100k. You would be using only gigabit Ethernet rather than 10gig Ethernet or FC, but you can replicate between the two.

# ¿ Jun 28, 2013 04:10

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Mr Shiny Pants posted:

How are these working out for you? Is the performance good? Have you tested HA on them? I am curious because I am a huge ZFS fan and some real world experience with Oracle/ Sun gear would be nice.

They work great for our needs, which are storage for a little over 200 vdi sessions and 50 Citrix servers. I suspect that we can double the number of vdi sessions we host without seeing a performance hit. We did play around with HA early on and the takeover was quite fast. VMs hung for about 2 seconds but then picked right back up again.

Honestly, for the price, Oracle should be murdering everyone else based on what I've seen.

# ¿ Jun 28, 2013 12:23

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Mr Shiny Pants posted:

Is this your primary storage? If not why not?

No, it's not. We have hundreds of servers, and we really leverage the snapmanager products from NetApp for the things we really have to backup. But for VDI and Citrix, we wanted raw IOPS as inexpensively as possible, while still having some nice replication and HA.

# ¿ Jun 28, 2013 21:14

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

if you want to just play around with other OS, virtualbox is the best solution imo. If you want hypervisor experience, Linux with KVM, Linux with Xen, or Windows 8 with Hyper-V will all be good platforms for a regularly used PC.

# ¿ Jul 10, 2013 04:07

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Dilbert As gently caress posted:

Kaspersky <3 I have found a new love

Hahahhahahahahahah. We use Kaspersky and seriously, gently caress it.

# ¿ Jul 20, 2013 00:28

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

my home VMware lab is backed by 6x 7200 drives in raid-6. It's plenty fast for me. It's not like I am stressing multiple servers at once like one would see in a production environment.

# ¿ Jul 27, 2013 05:17

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

Hadlock posted:

How responsive is virtualization to additional logical cores? If I am running Hyper-V under WS2012 R2 would I see an improvement in responsiveness in a VM lab setting if I went from a quad core i5 to an 8 (logical) core i7 haswell? I'm looking at running probably 4 VMs.

The answer is, as always, it depends. In general, the hypervisor is very good at scheduling and you will possibly see a loss of performance with hyperthreading. The reality is that you are unlikely to see much of a difference unless you are under contention.

# ¿ Aug 5, 2013 01:27

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

FISHMANPET posted:

Holy poo poo that's terrible.

The scheduler is what's going to kick the rear end of that 32 core VM. In order for that VM to run, the hypervisor has to find 32 free cores to run the machine on. So assuming that the physical machine it runs on even has 32 cores, performance is going to be abysmal. The scheduler will either hold back other VMs on the host from running until it can clear all 32 cores for this monstrosity, or the monstrosity will be waiting forever because the scheduler is going to keep slipping in a vm with a few cores to run, so you'll be stuck with something stupid like 31 cores free but you're vm can't execute anything.

Modern schedulers utilize relaxed coscheduling, and the 32 cores will not hurt performance that bad if at all, IF you are not already seeing CPU contention. Once you see contention, this 32 core VM will amplify it considerably.

# ¿ Aug 23, 2013 23:54

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

FISHMANPET posted:

Relaxed coscheduling exists but I don't think it will be that relaxed...

There will still be contention if a VM is using up all 32 cores on the host with no other VMs, because the hypervisor itself has to run a few things. Putting a huge VM on a machine like that kind of breaks all the good things about virtualization anyway (better hardware utilization, resiliency from machine failure). If somebody thinks that scheme is a good idea then they're probably too far off the deep end anyway to be able transition their IaaS plan to anything sane.

That's the point of relaxed coscheduling -- while all cores must have equal time, they don't have to be scheduled at the same time.

Look, I'm not saying it's a not a terrible idea, I'm just saying it's not necessarily going to kill the performance on the entire box.

# ¿ Aug 24, 2013 17:23

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

FISHMANPET posted:

Alright, I guess I'm not sure exactly how relaxed coscheduling works. I thought it could just fudge the timings a bit, but could it do more? Is it as powerful as running a few instructions on 16 cores, and then running a few on the other 16, and going back and forth like that?

I will use a 4 core example, but it should translate more or less to 32.

You have 4 cores allocated, and one thread that needs to run.

Without relaxed coscheduling: the hypervisor will wait until 4 cores are available, and they all get CPU time at once.
With relaxed coscheduling: the hypervisor will run your thread, then give equal CPU to the other cores before it will let the VM execute again. If you have no contention, no big deal, you use some extra CPU time. If there is contention already, you just wasted an additonal 3x the CPU for that one thread to execute.

# ¿ Aug 24, 2013 21:14

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

with the new read cache, it looks like you assign it per VM. How will that work with shared snapshots in a VDI environment? Can I designate that all of the machines that use this one snapshot can share the cache?

Mierdaan posted:

vCPUs, not physical cores. For people who love to over-provision.

Who doesn't over provison?

# ¿ Aug 26, 2013 23:22

Adbot: ADBOT LOVES YOU

# ¿ May 20, 2024 10:02

adorai: Nov 2, 2002; 10/27/04 Never forget; Grimey Drawer

evol262 posted:

Not to put too fine a point on it, but are you a Windows sysadmin? Creating a vlan-tagged bridge in raw KVM or Xen is exactly like doing it in RHEL. RHEV and oVirt provide wizards for this, as does hyper-v.

You're a Linux shop and you hire Linux admins. Any admin with experience in storage, networking, and Linux should be able to sleepwalk through daily Xen and KVM work, and you need the first two to be a good VMware admin anyway (probably with experience in the third). It's harder to performance tune because there aren't reams of books on best practices (you need to look for white papers), but it's not difficult.

No, but the Linux on the Desktop argument is beyond this. My argument is that you don't lose the cost savings because it's basic Linux skills that literally any RHCE can do by rote (it's part of the exam)

Stop comparing vSphere to KVM in this way, basically. Compare to RHEV. Or oVirt. Or XCP. But saying "I can point and click for VLANs in VMware but not Xen says nothing in the same way that "I can drag and drop NICs onto VLANs in RHEV but not in vmkernel" says nothing. You sort of have to compare products to products, not products to technologies.

These are the same kind of disparaging comments that historically have hurt Linux adoption rates. You are effectively calling him a newb and telling him that poor documentation isn't a bad thing since his team should just know how to do it anyway. In fact, you've implied that he is also a lovely VMware admin for lacking these skills.

I am an experienced VMware admin, and experienced network admin, and experienced storage admin, and an experienced Linux admin. Setting up openstack is a pain in the rear end and the documentation is in my opinion piss poor. It is very technically detailed and probably great for someone who only works with openstack or admins Linux servers all day. For someone who is accustomed to simply reading the documentation for the product I want to deploy and then following those instructions, it is not easy to setup openstack. The documentation presupposes a great deal of knowledge, and I had to do a significant amount of reading on other related projects before any of the openstack stack made sense to me.

The point of jre's post is that for many shops, there is no loving way they could jump to openstack. I know my environment could not do it, because although I am sure I could set it up and migrate all of our infrastructure to it if I wanted to, none of the other admins that work on my team would be able to use any of it, and the level of Linux knowledge required makes it impossible for many of them to get up to speed on it. If openstack is really aiming for the VMware stack, including ESXi, vCenter, SRM, and the new storage features, they are going to do one of two things: 1) make it more point and click gui driven like VMware's stack is, or 2) improve the documentation to the point that it is easy to follow, even for a "windows admin". The reality is that the bulk of business's don't have dedicated Linux teams, they have a team of system administrators who have to support a fuckton of random poo poo, and they frankly don't have the time or across the board skill level to support openstack as it is today.

# ¿ Sep 2, 2013 00:37

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›9 »