Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›4 »

Mierdaan: Sep 14, 2004; Pillbug

madsushi posted:

When people talk about NFS and dedupe vs iSCSI and dedupe, the key thing is what VMWare sees.

If you have a volume and a LUN via iSCSI and you get great dedupe, that extra space can only be utilized by the SAN for things like snapshots. The space isn't actually made available to the LUN and the underlying OS (VMWare).

If you have an NFS volume and you get great dedupe, VMWare sees that free space, so it can utilize that space to oversusbscribe your storage. Now you're actually using all that free space!

Plus NFS was the only way to exceed 2 TB datastores prior to the recent ESXi 5.

This seems like a myopic and virtualization-admin-centric way of viewing things. If the SAN is deduping and aware of that ratio, that allows you to create more LUNs and more datastores; there's no reason to look at this from a one-LUN standpoint, is there?

# ¿ Jul 10, 2012 06:18

Adbot: ADBOT LOVES YOU

# ¿ May 8, 2024 10:37

Mierdaan: Sep 14, 2004; Pillbug

For sure - my perspective is always the small shop admin view on things, where I'm doing both storage and virtualization. I'm being moved from NetApp to Compellent storage now, so I've had to deal with the actuality of deduping on the file level with NFS and having ESXi be aware of that ratio, or deduping at the block level and leveraging the ratio in your volume layout. There's no huge difference (for me) but if you're in a position where your storage is hands-off and you can just request gently caress-off huge NFS exports, then I definitely see the appeal.

Mierdaan fucked around with this message at 06:37 on Jul 10, 2012

# ¿ Jul 10, 2012 06:31

Mierdaan: Sep 14, 2004; Pillbug

I'm not sure who's actually buying VMware's VSA. It's a product, for sure, but that doesn't mean they're selling licenses.

# ¿ Jul 17, 2012 18:03

Mierdaan: Sep 14, 2004; Pillbug

Moey posted:

Don't mean to derail your question, but what advantage do you gain by spanning a VMFS datastore across multiple LUNs (my storage experience is pretty limited to a few devices)?

That was the only way to have large (>2TB) VMFS volumes prior to ESXi 5.0.

# ¿ Jul 23, 2012 22:21

Mierdaan: Sep 14, 2004; Pillbug

Moey posted:

Interesting. Since VMDK cannot do > 2TB disks, I never really been concerned about having a datastore larger than that.

As long as you're okay with a 1:1 VMDK:VMFS ratio, that's fine. Some people want/need more VMDKs crammed into a VMFS datastore though, if for no other reason than ease of management. Filling up your 2TB VMFS with a 2TB VMDK isn't a good idea anyways (go ahead and snapshot that VM and report back how that goes for you).

Additionally, you can get larger volumes in-guest with dynamic disks combining multiple VMDKs, so it'd make sense to have them in their own VMFS volume.

# ¿ Jul 24, 2012 00:02

Mierdaan: Sep 14, 2004; Pillbug

Anyone have anything to share on the performance impact of changing storage data counter levels in 5.0 U1? I'm experiencing the "no data available" bug on historical datastore performance graphs after updating to 5.0 U1, and apparently some hacky PowerCLI nonsense is required to get them working again.

# ¿ Aug 11, 2012 22:44

Mierdaan: Sep 14, 2004; Pillbug

Passed the VCP510 on the first try... I will say having never touched FC/FCoE or any Enterprise Plus features before puts you at a bit of a disadvantage, though.

Also, VMware likes to ask really stupid, pointless questions.

# ¿ Aug 21, 2012 21:51

Mierdaan: Sep 14, 2004; Pillbug

Check out the "How do snapshots work?" and "Child disks and disk usage" sections of this page.

# ¿ Aug 22, 2012 02:22

Mierdaan: Sep 14, 2004; Pillbug

Moey posted:

I can finally upgrade to 5!

I kinda liked vRAM entitlements in that they gave me an easy way to explain to my boss why we needed Enterprise licensing. "We'll waste 1/3rd of our RAM!" was an easier sell than "I really want svMotion so I don't have to work on weekends as much!"

# ¿ Aug 23, 2012 17:21

Mierdaan: Sep 14, 2004; Pillbug

DevNull posted:

Is it ok for me to now say that a lot of engineers at VMware were really pissed off by the vRAM licensing? It's wasn't just the customers.

I think we all read between the lines a year ago when you said the change was made by the salespeople

BnT posted:

I'm a little mad about that. I personally bought WS8 only 6 months ago and it's not eligible for a free upgrade.

It's eligible if you bought it after August 1st 2012. So no, you're SOL... but my new license is eligible for an upgrade :smug:

Mierdaan fucked around with this message at 19:36 on Aug 23, 2012

# ¿ Aug 23, 2012 19:33

Mierdaan: Sep 14, 2004; Pillbug

Have to wonder if anyone actually bought the ~21 Enterprise Plus CPU licenses for their 2TB hosts and is kicking themselves now...

# ¿ Aug 23, 2012 23:19

Mierdaan: Sep 14, 2004; Pillbug

VMware vSphere 5.1 to include shared-nothing live migration (scroll down for the actual article)
Shared-nothing migrations? Sounds good to me.

Has anyone used Avamar to know if it's a step up from VDR? I'm going to assume just about anything's a step up from VDR.

# ¿ Aug 24, 2012 15:34

Mierdaan: Sep 14, 2004; Pillbug

Anyone have the vSphere client working well in Windows8 yet? The install fails saying it requires XP SP3 or higher. There's some forums posts saying you can get it somewhat working with XP compatibility mode, but some features like connecting to consoles and mounting local DVD drives are broken.

# ¿ Aug 27, 2012 16:59

Mierdaan: Sep 14, 2004; Pillbug

Document dump, via Duncan Epping:

What�s New in VMware vSphere 5.1
What�s new in VMware vCenter 5.1
What�s New in VMware vSphere 5.1 � Networking
What�s New in VMware vSphere 5.1 � Platform
What�s New in VMware vSphere 5.1 � Storage
What�s New in VMware vSphere 5.1 � Performance
Introduction to VMware vSphere Replication
Introduction to VMware vSphere Data Protection
What�s new in VMware vSphere Storage Appliance
What�s new in vCloud Director 5.1

# ¿ Aug 27, 2012 18:19

Mierdaan: Sep 14, 2004; Pillbug

Misogynist posted:

Any word yet on license levels associated with the new features?

All I've read so far is they're rolling a lot of the paid features into one SKU, vCloud Suite 5.1, and Enterprise Plus customers get a free upgrade to the Standard version or a discount on an upgrade to the Enterprise version.

edit: source

# ¿ Aug 27, 2012 19:02

Mierdaan: Sep 14, 2004; Pillbug

It's an Adobe Flex app, so

# ¿ Aug 27, 2012 20:01

Mierdaan: Sep 14, 2004; Pillbug

Server 2008 Core servers are the main reason I use consoles

# ¿ Aug 27, 2012 20:59

Mierdaan: Sep 14, 2004; Pillbug

Mully Clown posted:

I just ran the "troubleshoot compatibility" and installed it with the default XP SP3 mode. Haven't had an issue at all, haven't tried to map a DVD though. Console works fine. vSphere 4.1 this is.

Yeah I get a "The VMRC console has disconnected...attempting to reconnect." error when I try to use a console. vCenter 5.0 U1.

# ¿ Aug 28, 2012 14:34

Mierdaan: Sep 14, 2004; Pillbug

Yes, with some caveats.

# ¿ Aug 28, 2012 22:45

Mierdaan: Sep 14, 2004; Pillbug

FISHMANPET posted:

What exactly are CPU requirements for FT? I'm looking at running an FT VM on a host with a 55xx chip and another host with an E5-24xx chip. Does that use EVC to properly mask the correct CPU bits so it all runs at the 55xx level? I can't find anything definitive in the recommendations, other than that the CPUs need to be compatible, but there's no explanation of what that means.

I swear, VMware's documentation is actually really good!

# ¿ Aug 30, 2012 14:34

Mierdaan: Sep 14, 2004; Pillbug

Same bold heading. Here's an updated list that includes Sandy Bridge procs, but still not the E5-24xx specifically. Download the Site Survey tool if you want to be sure.

# ¿ Aug 30, 2012 15:00

Mierdaan: Sep 14, 2004; Pillbug

That's about EVC because FT depends on HA (can't power on an FT VM without HA enabled) and on EVC for DRS (otherwise DRS is disabled for FT VMs).

# ¿ Aug 30, 2012 17:13

Mierdaan: Sep 14, 2004; Pillbug

I'd check out your logs and see if you can figure out what's going on. If the host is talking to the HA master and it's writing to heartbeat datastores (if you're on 5) HA's not going to know any better. I don't think even VM HA is going to do anything if the VMs wouldn't respond to a restart request anyways.

# ¿ Sep 2, 2012 12:26

Mierdaan: Sep 14, 2004; Pillbug

As long as we're on iSCSI storage problems, here's one I'm trying to run down.

We have 3 cluster hosts on 5.0 U1, and a Compellent SAN hosting 7 volumes presented to all three hosts. I configured the hosts for round-robin MPIO, though we do only have one switch between hosts and storage. Currently each Compellent volume shows up as having 6 paths even though there's two VM kernel ports and two NICs on each Compellent controller, so I assume this is because of iSCSI virtual port redirection.

Each vmknic is set to have one active, one unused uplink:

99% of the time, this hums along happily. However, sometimes in the middle of the night (normally 1-3am so firmly in our backup window) we get some of these messages in the log

code:

2012-08-30T05:59:02.822Z cpu7:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.100 associated
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000003 TARGET: iqn.2002-03.com.compellent:5000d310003fd530 TPGT: 0 TSIH: 0]
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.1.200:52875 R: 192.168.1.100:3260]
2012-08-30T05:59:03.076Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:1 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)

When this happens, for one one of our hosts, three of its paths to only some of its volumes will be marked as dead, and the Compellent itself will alert that the host isn't visible on all expected paths. Somewhere around 30 minutes later, we'll get

code:

2012-08-30T05:59:03.329Z cpu4:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.112 associated
2012-08-30T05:59:03.582Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "ONLINE"

I've gotten Compellent on the phone when this has happened before and they claim it's a VMware problem. There's a few KB articles that show iSCSI session drops like this when there's some jumbo frame MTU mismatches, but we're not using jumbos and our MTUs are set at 1500 everywhere. Further confusing this is that one of the iSCSI cards will actually show as "down" in Compellent SC, but it only alerts for one of the three hosts and doesn't alert about the card itself, just host visibility. Copilot claims that the GUI will show a card as down if it can't see all expected hosts on all paths, but I'm not sure I buy that.

Mierdaan fucked around with this message at 03:20 on Sep 5, 2012

# ¿ Sep 3, 2012 15:24

Mierdaan: Sep 14, 2004; Pillbug

@kachunkachunk: First off I gotta nitpick: RR is truly MPIO in that it defines multiple paths between a target and initiator. I agree with you in that it doesn't let you aggregate bandwidth between paths, but saying it's not truly MPIO is kinda misleading.

Compellent does recommend RR as the MP algorithm of choice for their arrays, but I haven't found any guidance as to how many ops before a RR change they recommend; I've left it at the default right now. The main VM that should be impacted in our backup window is our file server VM, and at least last night I verified that that VM wasn't running on the host the Compellent lost paths to. The problem has moved around a few times though, so it's possible that DRS could be affecting it; it's too bad DRS doesn't have some sort of setting for "Aggressive but notify me". I'm pondering setting up some alerts for that so I can tell when DRS is taking actions; I think there's a "hot migrating" event I can alert on, right?

FWIW, it does seem to be one target that is going offline during these events. Other hosts that are mapped to the same volume, presumably (though I can't back this up) with VMs on the same LUN aren't dropping off though, so I'm not really sure it's a performance issue or it'd affect more than one host as latency for all hosts climbed.

Is this the issue that you were talking about that involved changing the OperationLimit to a single-digit number? Looks like that was cleared up by 4.0 U5, so I'd hope whatever bug was cleared up in that update didn't persist into 5.0 U1.

# ¿ Sep 3, 2012 21:42

Mierdaan: Sep 14, 2004; Pillbug

We're in virtual port mode, but best practices say you can use separate vSwitches or one - it shouldn't make a difference. Will post more tomorrow from work, on phone now...

# ¿ Sep 4, 2012 01:10

Mierdaan: Sep 14, 2004; Pillbug

KS posted:

~~Not sure on that, you may be right.~~ I know this is the setup where if you did it in the GUI it is automatically incorrect. You have to bind each VMK to the corresponding hardware adapter (or to the software adapter if you prefer), so you may want to verify that it's set up correctly.

edit: actually thinking this through, the above means that the two-NIC to one vswitch setup you posted is definitely incorrect. They can't be set up to fail over between physical NICs if they're correctly bound to one physical NIC. I'd guess this is the source of your problem.

Link

You're not thinking of this correctly. The vmknics are both bound to the software iscsi initiator, and then each vmknic uses a single uplink as its active uplink - the other is marked inactive, so there's no failover happening at the physical uplink level ever. The software iscsi initiator sees those two vmknics as potential paths to storage, and then can round-robin between the two of them. Whether this happens within one vswitch or across multiple vswitches shouldn't be relevent - see the Fourth Topic in VirtualGeek's MultiVendor iSCSI blog entry.

Also, what you're remembering about not being able to bind multiple vmknics to the software iscsi initiator via the GUI isn't true in 5.0 anymore, see this post.

# ¿ Sep 4, 2012 02:32

Mierdaan: Sep 14, 2004; Pillbug

KS posted:

It is odd that you have 6 paths. How are the ports on the Compellent set up? Are they in a single fault domain given that they're plugged into the same switch? Are you in virtual ports mode?

I would think you should have two paths if in legacy mode or four if in virtual ports mode on the Compellent as long as you have dual controllers. Six seems like there's a misconfiguration somewhere.

edit: I am pretty sure the vmknics should be on a seperate vswitch. Example of what a correct config looks like:

This is from a branch office we have with a single-controller Compellent plugged into a single switch with broadcom hardware-assisted iscsi. Looks similar to your setup.

Missed this reply somehow so I did a more detailed post of the Compellent side of things over in the Enterprise Storage thread.

# ¿ Sep 4, 2012 03:25

Mierdaan: Sep 14, 2004; Pillbug

Okay, the 6 path thing was a total non-issue. VMware support pinned it on the fact that I had volumes mapped to the hosts at the time I bound the vmknics to the iSCSI initiator, and those two paths that already existed (from vmhba34 to the two SAN ports) don't disappear until you reboot the host. I popped a host into maintenance, rebooted, and voila - 4 paths. Also, they say there's absolutely no issue using one vSwitch as I've done, as long as each vmknic has one active uplink only.

# ¿ Sep 4, 2012 19:20

Mierdaan: Sep 14, 2004; Pillbug

1 month, 11 hours and 42 minutes. I'm proud of your restraint, son.

# ¿ Sep 5, 2012 19:00

Mierdaan: Sep 14, 2004; Pillbug

I sometimes wish FT was an Ent+ feature only or something, because the fact that it's available Standard and Ent just makes people think it's a good idea to actually use it.

# ¿ Sep 6, 2012 03:19

Mierdaan: Sep 14, 2004; Pillbug

Rhymenoserous posted:

Well not vmotion but you can migrate without shared.

If they're just now upgrading to 5 and haven't sprung for shared storage yet, what are the chances they've got Enterprise licensing

# ¿ Sep 6, 2012 17:22

Mierdaan: Sep 14, 2004; Pillbug

Anyone have any info as to whether or not new VCPs will get their free Workstation8 codes upgraded to 9?

# ¿ Sep 12, 2012 13:57

Mierdaan: Sep 14, 2004; Pillbug

Corvettefisher posted:

"Congratulations on passing the VCP! I will contact the VCP team to find out if they they have anything planned."

As far as I ever get, one VCP5 I know said he got an ETA of 6-8 weeks for a key....

I got my key two days after VMware acknowledged I passed the test. Also, coincidentally, like two weeks after Workstation 9 was announced.

# ¿ Sep 12, 2012 14:12

Mierdaan: Sep 14, 2004; Pillbug

Anyone updated to vCenter or ESXi 5.1 yet? Any trip reports?

# ¿ Sep 13, 2012 20:24

Mierdaan: Sep 14, 2004; Pillbug

How long should I spend troubleshooting terrible datastore latency when using Dell's R610 integrated broadcom NICs before I just replace them? Do these things work OK for anyone?

# ¿ Sep 13, 2012 22:19

Mierdaan: Sep 14, 2004; Pillbug

Moey posted:

I have never had any problem with ours. For almost two years they have been used in multiple roles without issue (management, VMNetwork, iSCSI and vMotion).

Is it only with storage you are having issues, or is there bad latency when used for other traffic?

My issue is seeing the normal 5-15ms latency on my Compellent disks, but crap like this at the datastore level.

This is with 2x onboard Broadcom interfaces configured as I was describing earlier in the thread (look at post history for screenshots).

code:

# ethtool -i vmnic2
driver: bnx2
version: 2.0.15g.v50.11-5vmw
firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8
bus-info: 0000:02:00.0
# ethtool -i vmnic3
driver: bnx2
version: 2.0.15g.v50.11-5vmw
firmware-version: 5.2.7 bc 5.2.2 NCSI 2.0.8
bus-info: 0000:02:00.1

code:

Switch Ports Model              SW Version            SW Image
------ ----- -----              ----------            ----------
*    1 30    WS-C3560X-24       12.2(53)SE2           C3560E-UNIVERSALK9-M

...

!
interface GigabitEthernet0/15
 description Uplink to VMware host
 flowcontrol receive desired
 spanning-tree portfast
!

# ¿ Sep 13, 2012 23:52

Mierdaan: Sep 14, 2004; Pillbug

Yeah updating the drivers was already on my agenda for tomorrow.

esxtop shows DAVG/rd and DAVG/wr spiking to >100 pretty regularly, so that matches the performance graphs. And yeah it's all 3 cluster hosts.

# ¿ Sep 14, 2012 03:02

Mierdaan: Sep 14, 2004; Pillbug

How do you normally update the firmware on Broadcom NICs in an ESXi host, given that everyone seems to package them for Windows and Linux only? CentOS live CD?

A VMware support rep literally just suggested to me, twice, that I create a Windows guest and run the firmware update utility in there.

# ¿ Sep 14, 2012 19:28

Adbot: ADBOT LOVES YOU

# ¿ May 8, 2024 10:37

Mierdaan: Sep 14, 2004; Pillbug

Syano posted:

Dell server? Should have a boot cd you can use to update your firmware. Stick your service tag in the support site and you can download the latest copy if you do not have it

BTW. This is one the myriad reasons you dont use supermicro unless you really know what you are getting in to

Yeah they're Dell. Thanks for reminding me that Dell puts those out.

late-edit: actually the Dell Repo Manager tool doesn't even get the latest version of the Broadcom firmware. They actually recommend the liveCD method, specifically using the OMSA LiveCD and using the RedHat .bin driver packages off the Dell site.

Mierdaan fucked around with this message at 21:26 on Sep 14, 2012

# ¿ Sep 14, 2012 19:34

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›4 »