Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »

evil_bunnY: Apr 2, 2003

And then not getting updates? Sounds good mate.

# ? Aug 30, 2012 22:34

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 03:56

Docjowles: Apr 9, 2009

evil_bunnY posted:

And then not getting updates? Sounds good mate.

Yeah this is the real gotcha. I've never had to actually call VMware for help. I want to get us onto 5.x but my current company let their support lapse years ago and the cost to renew with back payments is hilarious. A sales rep did the numbers both ways and it's literally cheaper to buy new licenses and support contracts than to renew the ones we have.

That probably changes once you have a good enough relationship with VMware, but we don't, so welp.

# ? Aug 30, 2012 22:52

Moey: Oct 22, 2010; I LIKE TO MOVE IT

evil_bunnY posted:

And then not getting updates? Sounds good mate.

This.

Unless you want to run ESXi 5.1 in 2016.

# ? Aug 30, 2012 22:53

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

evil_bunnY posted:

And then not getting updates? Sounds good mate.

That's kind of what I thought about going the no support route, but any thoughts on Production vs Basic?

# ? Aug 30, 2012 23:18

Erwin: Feb 17, 2006

FISHMANPET posted:

And if it ends up being that we need two identical servers then we can buy another server, I just need to know if that's necessary or not.

I just now finished putting in service a new Sandy Bridge host in my all-Nehalem cluster, so I figured I'd try this for shits and giggles. When I tried to migrate the secondary to the new host, I got

quote:

Host CPU is incompatible with the virtual machine's requirements
Mismatch detected for these features:
CPU model does not match

So looks like no? Caveat is that this is the first time I tried setting up FT (edit: in this environment), and there's no operating system (and thus no VMware Tools) on the VM. EVC mode is set to Nehalem.

Erwin fucked around with this message at 23:37 on Aug 30, 2012

# ? Aug 30, 2012 23:24

Docjowles: Apr 9, 2009

FISHMANPET posted:

That's kind of what I thought about going the no support route, but any thoughts on Production vs Basic?

How OK is management with having to wait 3 days for VMware to even start looking at an issue that crops up on Friday afternoon? The marginal price increase is so negligible that I wouldn't bother downgrading.

# ? Aug 30, 2012 23:36

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Misogynist posted:

It's still marked Technical Preview as of VMworld 2012 though, so we'll probably be waiting for a 6.0 release before it's prime time unless you know something I don't.

You're right, probably 6.0. Gelsinger spoke of the tick-tock methodology Intel uses, so 6.0 could be next year some time. That's not really that long. V :shobon:

# ? Aug 30, 2012 23:42

CrazyLittle: Sep 11, 2001; Clapping Larry

Anything worth seeing at the VMWorld expo?

# ? Aug 31, 2012 01:04

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Docjowles posted:

How OK is management with having to wait 3 days for VMware to even start looking at an issue that crops up on Friday afternoon? The marginal price increase is so negligible that I wouldn't bother downgrading.

On the flip side, even if support does look at it within the guaranteed turnaround window, you'll have figured out the issue and solved it much less disruptively than they would have by the time they get back to you with an L1 form letter response.

I once had them tell me that my issue could only be resolved by rebooting my entire vSphere cluster in the middle of the day. Thanks for nothing, fuckwits.

# ? Aug 31, 2012 02:27

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Misogynist posted:

On the flip side, even if support does look at it within the guaranteed turnaround window, you'll have figured out the issue and solved it much less disruptively than they would have by the time they get back to you with an L1 form letter response.

I once had them tell me that my issue could only be resolved by rebooting my entire vSphere cluster in the middle of the day. Thanks for nothing, fuckwits.

Quality dramatically drops based on where you're routed. I have had good experiences with the Canadians reps, but the Indian reps are usually very weak and extremely slow.

# ? Aug 31, 2012 02:32

Sparkyhodgo: Jun 21, 2003; Long potato

PARALLELS 8 RUNS WINDOWS 8 AND DIRECTX 10 :siren:

As a long-time Parallels user, I just got the preview of Parallels 8. I'm blown away :aaa:

Let goons rejoice: Parallels 8 now supports Windows 8 from Bootcamp (which, btw, works flawlessly despite Bootcamp "supporting" only Windows 7) and DirectX 10 (again, unofficially, but it appears to work great). :dance:

On a Retina MacBook Pro with Parallels 8, I'm getting ~40fps in Bioshock 2 running on a Windows 8 Consumer Preview VM from Bootcamp. Batman Arkham City is working, too (albeit in DirectX10, not in DirectX 11). I should add that the screenshot below is on an external display. This freaking amazing Geforce 650M is also running my not-pictured retina display and a second external screen. :zoid:

ninja edit: City, not Asylum, which I haven't tried yet

Sparkyhodgo fucked around with this message at 18:07 on Sep 1, 2012

# ? Sep 1, 2012 17:51

Noghri_ViR: Oct 19, 2001; Your party has died.
Please press [ENTER] to continue to the
Las Vegas Bowl

CrazyLittle posted:

Anything worth seeing at the VMWorld expo?

actually yes there was. Overall it was a good conference and I'll go back next yea. My biggest complaint was that compared to teched there were a lot more power point presentations and not demos which is where teched excels at.

Also I got caught in a straight shill for Samsung session. It was about enabling Byod for your workforce and what it really was was a Samsung guy sitting on stage telling us everything they've been doing for byod. I sat through it for 35 minutes and bailed

# ? Sep 1, 2012 18:26

CrazyLittle: Sep 11, 2001; Clapping Larry

Yeah, I'm on the ISP / provider side, and we're not really big enough to get into the whole "hosted cloud" market when every bigger player's already stuck in that race to the bottom. I didn't see a whole lot of stuff there that would help me directly other than some of the virtualization focused hardware.

# ? Sep 1, 2012 22:34

luminalflux: May 27, 2005

I just love it when ESXi becomes catatonic, but still appears up to HA. I can ping the host, but can't do poo poo: VMs aren't responding and management requests from vCenter does nothing. Is there a way to tell vCenter to drop a host that is misbehaving like this, without having to do a 2am cab ride to reboot it?

# ? Sep 2, 2012 11:13

Mierdaan: Sep 14, 2004; Pillbug

I'd check out your logs and see if you can figure out what's going on. If the host is talking to the HA master and it's writing to heartbeat datastores (if you're on 5) HA's not going to know any better. I don't think even VM HA is going to do anything if the VMs wouldn't respond to a restart request anyways.

# ? Sep 2, 2012 12:26

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Mierdaan posted:

I'd check out your logs and see if you can figure out what's going on. If the host is talking to the HA master and it's writing to heartbeat datastores (if you're on 5) HA's not going to know any better. I don't think even VM HA is going to do anything if the VMs wouldn't respond to a restart request anyways.

Which is unfortunate, actually -- the correct response of a proper HA clustering solution (OCW, RHCS, Pacemaker, etc.) would be to fence the problematic node to prevent it from accessing shared resources, either by doing an IPMI poweroff (STONITH) or by disabling its ports on the Ethernet/FC switches. vSphere has omitted fencing entirely in order to simplify configuration; HA nodes are responsible for their own poweroff in a split-brain scenario, which is a start, but doesn't do much for instances where the whole host goes unresponsive but still keeps a bunch of shared resources open.

Vulture Culture fucked around with this message at 18:12 on Sep 2, 2012

# ? Sep 2, 2012 18:10

luminalflux: May 27, 2005

Mierdaan posted:

I'd check out your logs and see if you can figure out what's going on. If the host is talking to the HA master and it's writing to heartbeat datastores (if you're on 5) HA's not going to know any better. I don't think even VM HA is going to do anything if the VMs wouldn't respond to a restart request anyways.

Nope, it's 4.1. As soon as I rebooted via iLO the VMs started up on other hosts. I don't have logs on the ESXi since I neglected to configure a persistent scratch storage and/or set up a syslog target. Spent part of my sunday doing exactly that.

Is it worth even talking to support if I don't have logs from the hypervisor, or should I just upgrade to U3 that just came out and see if it happens again? This is the second time in a couple weeks this has happened (huge load spike, this specific machine goes catatonic and needs some iLO love).

# ? Sep 2, 2012 19:29

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

luminalflux posted:

Is it worth even talking to support if I don't have logs from the hypervisor, or should I just upgrade to U3 that just came out and see if it happens again? This is the second time in a couple weeks this has happened (huge load spike, this specific machine goes catatonic and needs some iLO love).

I'm noticing you're not following through with "set up a syslog server" as part of the action plan you're describing. If you're not a Linux-savvy fellow, there's some instructions floating around the Internet about how to set up the vMA as your syslog host.

# ? Sep 2, 2012 19:43

luminalflux: May 27, 2005

Misogynist posted:

I'm noticing you're not following through with "set up a syslog server" as part of the action plan you're describing.

Sorry if I was unclear - I set up both persistent scratch storage AND a syslog server, so I have logs in both places (verified that they're being written in both places as well). And I more or less only deal with Unix servers.

# ? Sep 2, 2012 19:53

Kachunkachunk: Jun 6, 2011

Not much point unless you have log bundles. And they really need to be captured within minutes after the problem occurs - from the host and vCenter Server.
You might have to do this stuff from the direct console or SSH, if enabled.

Here are some questions you would be asked / things you could have tried:
1) Were the VMs online and responsive (yet stuck on that box - try ping, ssh, rdp, vnc, etc - reach out to known services in the VMs)?
2) Did the box ping?
3) Can vCenter server still ping and resolve that host by FQDN and hostname?
4) What does the console say?
5) Are you able to log into the console/iLO and access the Direct Console User Interface?
6) Are you able to hit alt+F12 and see the vmkernel logs scrolling anything?
7) Are you able to log into technical support mode / local shell?
(Alt+F1, but it also has to be enabled in the DCUI).

Basically if any of the above is true, the host isn't hung, crashed, or down. But the host management agents are not responding to vCenter Server. If a host ever appears as Not Responding, it's because VC thinks it is, but it might not actually be down.

With that said, there's a lot that could be going wrong, so it warrants checking into. But before restarting management agents or hosts, try to collect log bundles (from the console, if you must), or you'll lose too much information about root-cause. Even with persistent log storage and syslog. The bundles contain stuff about connection counts, memory use, etc. that won't be in actual logging.

I'm not really sure what it could be, though. A Not Responding state has too many possible causes.

# ? Sep 2, 2012 19:55

luminalflux: May 27, 2005

quote:

1) Were the VMs online and responsive (yet stuck on that box - try ping, ssh, rdp, vnc, etc - reach out to known services in the VMs)?
Nope (found this out by nagios going apeshit)
2) Did the box ping?
Yep.
3) Can vCenter server still ping and resolve that host by FQDN and hostname?
Yep.

As for console access, I don't have the relevant licenses for text-console (only serial port access).

# ? Sep 2, 2012 20:05

Kachunkachunk: Jun 6, 2011

What kind of storage are the VMs on - does the host go over FC or Ethernet (iSCSI, NFS, FCoE)?
It sounds like it might have lost storage access at the time. I suppose it can happen during heavy load if there are driver/firmware/overheating (earlier 10Gb cards) problems, for instance.
In the case of Ethernet, despite having teaming set up, your link state might not actually go down, so failover does not take place. Beacon probing would get around that (also a way to overcome driver/firmware-related outages, when using different physical NICs in the team - it kind of compliments that best practice).
You can try to compare the time that Nagios reported outages with the time that the host stopped responding to VC - probably very close.

As you likely read earlier, unfortunately HA (4.x and earlier) doesn't do shared datastore monitoring, so server ping responsiveness is all that matters to the other monitoring cluster members. Fault Domain Manager (FDM, or HA for 5.x and up) takes storage access into consideration as well and can let a server isolate itself if it must... and is configured to.

Anyway storage loss, if it really occurred, will be one of those things that can (most likely will) make the server misbehave and not respond to any management requests, including direct agent login. Those worlds/processes are waiting on blocking I/O to complete against whatever resources that are "missing" or unreachable.

I suppose if you set up the boxes to panic on NMI, then do just that, support can use that information and probably have a better chance of figuring out root-cause than with logging. Might need to give the server some time to dump before you reboot it again (an hour is safe), since you can't see the console say "Disk dump successful."
Once it starts PSODing, then the VMs should HA over to another box as normal, too.

# ? Sep 3, 2012 04:47

luminalflux: May 27, 2005

Storage is on iSCSI, and the other 3 ESXis could reach the datastores. Everything's on GbE with NIC teaming into 2 switches. So it could very well be the NIC - both iSCSI links are on the on-board NIC, whereas the management and production networks are split over the on-board Broadcom and an Intel in the PCI slot.

# ? Sep 3, 2012 09:41

Mierdaan: Sep 14, 2004; Pillbug

As long as we're on iSCSI storage problems, here's one I'm trying to run down.

We have 3 cluster hosts on 5.0 U1, and a Compellent SAN hosting 7 volumes presented to all three hosts. I configured the hosts for round-robin MPIO, though we do only have one switch between hosts and storage. Currently each Compellent volume shows up as having 6 paths even though there's two VM kernel ports and two NICs on each Compellent controller, so I assume this is because of iSCSI virtual port redirection.

Each vmknic is set to have one active, one unused uplink:

99% of the time, this hums along happily. However, sometimes in the middle of the night (normally 1-3am so firmly in our backup window) we get some of these messages in the log

code:

2012-08-30T05:59:02.822Z cpu7:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.100 associated
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000003 TARGET: iqn.2002-03.com.compellent:5000d310003fd530 TPGT: 0 TSIH: 0]
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.1.200:52875 R: 192.168.1.100:3260]
2012-08-30T05:59:03.076Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:1 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)

When this happens, for one one of our hosts, three of its paths to only some of its volumes will be marked as dead, and the Compellent itself will alert that the host isn't visible on all expected paths. Somewhere around 30 minutes later, we'll get

code:

2012-08-30T05:59:03.329Z cpu4:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.112 associated
2012-08-30T05:59:03.582Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "ONLINE"

I've gotten Compellent on the phone when this has happened before and they claim it's a VMware problem. There's a few KB articles that show iSCSI session drops like this when there's some jumbo frame MTU mismatches, but we're not using jumbos and our MTUs are set at 1500 everywhere. Further confusing this is that one of the iSCSI cards will actually show as "down" in Compellent SC, but it only alerts for one of the three hosts and doesn't alert about the card itself, just host visibility. Copilot claims that the GUI will show a card as down if it can't see all expected hosts on all paths, but I'm not sure I buy that.

Mierdaan fucked around with this message at 03:20 on Sep 5, 2012

# ? Sep 3, 2012 15:24

Diet Butcher: May 2, 2005; Worst Song, Played on Ugliest Guitar.

Mostly work with Hyper-V but am currently working on upgrading some old VMWare Hosts to new Hardware.

If we Own Essentials Plus 5.0 and are under current maintenance will we be able to upgrade to 5.1 without any additional costs?

Currently running 2 Hosts in Development with plans to migrate VMs off of 2 Hosts (1 ESXi 3.5, 1 ESX 3.5). Migrated a few VMs already succsfully using the Converter, not sure if the aforementioned hardware version increase in 5.1 would affect anything migration wise? Or the VMs will run on the older hardware version just fine.

Thanks.

# ? Sep 3, 2012 18:08

Noghri_ViR: Oct 19, 2001; Your party has died.
Please press [ENTER] to continue to the
Las Vegas Bowl

Mierdaan posted:

As long as we're on iSCSI storage problems, here's one I'm trying to run down.

We have 3 cluster hosts on 5.0 U1, and a Compellent SAN hosting 7 volumes presented to all three hosts. I configured the hosts for round-robin MPIO, though we do only have one switch between hosts and storage. Currently each Compellent volume shows up as having 6 paths even though there's two VM kernel ports and two NICs on each Compellent controller, so I assume this is because of iSCSI virtual port redirection.

Each vmk is active on one portgroup, unused on the other:

99% of the time, this hums along happily. However, sometimes in the middle of the night (normally 1-3am so firmly in our backup window) we get some of these messages in the log
code:
2012-08-30T05:59:02.822Z cpu7:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.100 associated
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Sess [ISID: 00023d000003 TARGET: iqn.2002-03.com.compellent:5000d310003fd530 TPGT: 0 TSIH: 0]
2012-08-30T05:59:03.075Z cpu5:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: Conn [CID: 0 L: 192.168.1.200:52875 R: 192.168.1.100:3260]
2012-08-30T05:59:03.076Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StopConnection: vmhba34:CH:1 T:3 CN:0: iSCSI connection is being marked "OFFLINE" (Event:4)
When this happens, for one one of our hosts, three of its paths to only some of its volumes will be marked as dead, and the Compellent itself will alert that the host isn't visible on all expected paths. Somewhere around 30 minutes later, we'll get
code:
2012-08-30T05:59:03.329Z cpu4:2713)iscsi_vmk: iscsivmk_ConnNetRegister: socket 0x410029000010 network tracker id 1 tracker.iSCSI.192.168.1.112 associated
2012-08-30T05:59:03.582Z cpu4:2713)WARNING: iscsi_vmk: iscsivmk_StartConnection: vmhba34:CH:2 T:3 CN:0: iSCSI connection is being marked "ONLINE"
I've gotten Compellent on the phone when this has happened before and they claim it's a VMware problem. There's a few KB articles that show iSCSI session drops like this when there's some jumbo frame MTU mismatches, but we're not using jumbos and our MTUs are set at 1500 everywhere. Further confusing this is that one of the iSCSI cards will actually show as "down" in Compellent SC, but it only alerts for one of the three hosts and doesn't alert about the card itself, just host visibility. Copilot claims that the GUI will show a card as down if it can't see all expected hosts on all paths, but I'm not sure I buy that.

I'm getting a very similar problem to this around the same 1-3 timeframe. Except in my case I'm using NFS and Netapp.

Let's talk offline and fine if we're using any similar hardware for the hosts, switching ect ect

Also I found it very interesting at the VMWorld storage best practices session that both the Netapp guy and the EMC guy said to turn off jumbo frames cause they just cause more problems then they are worth

# ? Sep 3, 2012 19:24

Kachunkachunk: Jun 6, 2011

luminalflux posted:

Storage is on iSCSI, and the other 3 ESXis could reach the datastores. Everything's on GbE with NIC teaming into 2 switches. So it could very well be the NIC - both iSCSI links are on the on-board NIC, whereas the management and production networks are split over the on-board Broadcom and an Intel in the PCI slot.

Alrighty, so perhaps a remote syslog server would successfully receive logs during a recurrence. Are you able to set up iSCSI so it uses different physical adapters?
Ideally you would have some redundancy which utilizes separate adapters (multi-port cards don't quite account for all NIC-related issues like firmware barfing). And if you can get two different drivers to be involved, that covers the driver side of failures as well.
Off the top of my head, you could consider turning off interrupt remapping (introduced in 4.1), in case the server isn't handling it properly. There is a pretty specific problem signature to match with that, but you'd need intact server logs from quite some time before, then at the time of failure. I've seen cases where it took a week or so *after* the sign of interrupt remapping causing a problem, before the actual visible symptoms came up once. However it's usually an immediate effect (especially for storage traffic).

Mierdaan posted:

As long as we're on iSCSI storage problems, here's one I'm trying to run down.
[...]
I've gotten Compellent on the phone when this has happened before and they claim it's a VMware problem. There's a few KB articles that show iSCSI session drops like this when there's some jumbo frame MTU mismatches, but we're not using jumbos and our MTUs are set at 1500 everywhere. Further confusing this is that one of the iSCSI cards will actually show as "down" in Compellent SC, but it only alerts for one of the three hosts and doesn't alert about the card itself, just host visibility. Copilot claims that the GUI will show a card as down if it can't see all expected hosts on all paths, but I'm not sure I buy that.

Not sure about that either, but it really sounds important enough to finally straighten out.

I can't say for sure, but it looks like from your log snippets, it was Target 3's paths (T3) being marked offline. Perhaps the load is bringing the port down (to ESX's perspective, at least). I guess you'd have to take it forward with performance troubleshooting and see if device latency is climbing, you're seeing lots of pause frames (well, is flow control on?), that kind of stuff.

Only other thing is that Round Robin isn't truly MPIO - it puts chunks of I/O down each path sequentially - but it's still a good pathing policy to go with, regardless of what kind of array (active-active, or active-passive; ESX will only use the active paths and understands what is standby).

Does Compellent recommend using round-robin? Try to see if any particular target ID is going down consistently during each occurrence. If it's different each time, maybe too many hosts are loading the same path (coincidentally/by-chance) on a given night - this depends on VM loads, I imagine, so DRS could shift the loads around. Perhaps Compellent might also recommend a smaller round robin I/O count before swapping paths. EMC, for instance, recommended a value of "1" for specific EMC storage configurations and it had a marked performance improvement for one troubled environment.

Kachunkachunk fucked around with this message at 20:29 on Sep 3, 2012

# ? Sep 3, 2012 20:26

luminalflux: May 27, 2005

Kachunkachunk posted:

Alrighty, so perhaps a remote syslog server would successfully receive logs during a recurrence. Are you able to set up iSCSI so it uses different physical adapters?

Yep, that's the plan. Someone else set up these NICs up, I have no idea why iSCSI isn't split over the different NICs like the other nets (each ESXi has an on-board 4-port Broadcom and a 4-port intel in the PCI slot)

# ? Sep 3, 2012 21:36

Mierdaan: Sep 14, 2004; Pillbug

@kachunkachunk: First off I gotta nitpick: RR is truly MPIO in that it defines multiple paths between a target and initiator. I agree with you in that it doesn't let you aggregate bandwidth between paths, but saying it's not truly MPIO is kinda misleading.

Compellent does recommend RR as the MP algorithm of choice for their arrays, but I haven't found any guidance as to how many ops before a RR change they recommend; I've left it at the default right now. The main VM that should be impacted in our backup window is our file server VM, and at least last night I verified that that VM wasn't running on the host the Compellent lost paths to. The problem has moved around a few times though, so it's possible that DRS could be affecting it; it's too bad DRS doesn't have some sort of setting for "Aggressive but notify me". I'm pondering setting up some alerts for that so I can tell when DRS is taking actions; I think there's a "hot migrating" event I can alert on, right?

FWIW, it does seem to be one target that is going offline during these events. Other hosts that are mapped to the same volume, presumably (though I can't back this up) with VMs on the same LUN aren't dropping off though, so I'm not really sure it's a performance issue or it'd affect more than one host as latency for all hosts climbed.

Is this the issue that you were talking about that involved changing the OperationLimit to a single-digit number? Looks like that was cleared up by 4.0 U5, so I'd hope whatever bug was cleared up in that update didn't persist into 5.0 U1.

# ? Sep 3, 2012 21:42

KS: Jun 10, 2003; Outrageous Lumpwad

It is odd that you have 6 paths. How are the ports on the Compellent set up? Are they in a single fault domain given that they're plugged into the same switch? Are you in virtual ports mode?

I would think you should have two paths if in legacy mode or four if in virtual ports mode on the Compellent as long as you have dual controllers. Six seems like there's a misconfiguration somewhere.

edit: I am pretty sure the vmknics should be on a seperate vswitch. Example of what a correct config looks like:

This is from a branch office we have with a single-controller Compellent plugged into a single switch with broadcom hardware-assisted iscsi. Looks similar to your setup.

KS fucked around with this message at 01:11 on Sep 4, 2012

# ? Sep 4, 2012 00:55

Mierdaan: Sep 14, 2004; Pillbug

We're in virtual port mode, but best practices say you can use separate vSwitches or one - it shouldn't make a difference. Will post more tomorrow from work, on phone now...

# ? Sep 4, 2012 01:10

KS: Jun 10, 2003; Outrageous Lumpwad

~~Not sure on that, you may be right.~~ I know this is the setup where if you did it in the GUI it is automatically incorrect. You have to bind each VMK to the corresponding hardware adapter (or to the software adapter if you prefer), so you may want to verify that it's set up correctly.

edit: actually thinking this through, the above means that the two-NIC to one vswitch setup you posted is definitely incorrect. They can't be set up to fail over between physical NICs if they're correctly bound to one physical NIC. I'd guess this is the source of your problem.

Link

KS fucked around with this message at 01:37 on Sep 4, 2012

# ? Sep 4, 2012 01:15

three: Aug 9, 2007; i fantasize about ndamukong suh licking my doodoo hole

Mierdaan posted:

We're in virtual port mode, but best practices say you can use separate vSwitches or one - it shouldn't make a difference. Will post more tomorrow from work, on phone now...

Very interested to hear what you figure out. Please share when you get to a resolution.

# ? Sep 4, 2012 01:19

Mierdaan: Sep 14, 2004; Pillbug

KS posted:

~~Not sure on that, you may be right.~~ I know this is the setup where if you did it in the GUI it is automatically incorrect. You have to bind each VMK to the corresponding hardware adapter (or to the software adapter if you prefer), so you may want to verify that it's set up correctly.

edit: actually thinking this through, the above means that the two-NIC to one vswitch setup you posted is definitely incorrect. They can't be set up to fail over between physical NICs if they're correctly bound to one physical NIC. I'd guess this is the source of your problem.

Link

You're not thinking of this correctly. The vmknics are both bound to the software iscsi initiator, and then each vmknic uses a single uplink as its active uplink - the other is marked inactive, so there's no failover happening at the physical uplink level ever. The software iscsi initiator sees those two vmknics as potential paths to storage, and then can round-robin between the two of them. Whether this happens within one vswitch or across multiple vswitches shouldn't be relevent - see the Fourth Topic in VirtualGeek's MultiVendor iSCSI blog entry.

Also, what you're remembering about not being able to bind multiple vmknics to the software iscsi initiator via the GUI isn't true in 5.0 anymore, see this post.

# ? Sep 4, 2012 02:32

Mierdaan: Sep 14, 2004; Pillbug

KS posted:

It is odd that you have 6 paths. How are the ports on the Compellent set up? Are they in a single fault domain given that they're plugged into the same switch? Are you in virtual ports mode?

I would think you should have two paths if in legacy mode or four if in virtual ports mode on the Compellent as long as you have dual controllers. Six seems like there's a misconfiguration somewhere.

edit: I am pretty sure the vmknics should be on a seperate vswitch. Example of what a correct config looks like:

This is from a branch office we have with a single-controller Compellent plugged into a single switch with broadcom hardware-assisted iscsi. Looks similar to your setup.

Missed this reply somehow so I did a more detailed post of the Compellent side of things over in the Enterprise Storage thread.

# ? Sep 4, 2012 03:25

Mierdaan: Sep 14, 2004; Pillbug

Okay, the 6 path thing was a total non-issue. VMware support pinned it on the fact that I had volumes mapped to the hosts at the time I bound the vmknics to the iSCSI initiator, and those two paths that already existed (from vmhba34 to the two SAN ports) don't disappear until you reboot the host. I popped a host into maintenance, rebooted, and voila - 4 paths. Also, they say there's absolutely no issue using one vSwitch as I've done, as long as each vmknic has one active uplink only.

# ? Sep 4, 2012 19:20

vty: Nov 8, 2007; oh dott, oh dott!

Anyone use Xangati? I've recently migrated my entire environment from Xenserver over to Vmware Enterprise (VSPP). I have Veeam One, haven't gotten to far with messing around with it as it's only been a few days, but I just ran across Xangati and it looks pretty damned nice.

# ? Sep 4, 2012 20:30

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

I'm currently installing a vCenter server for my 2 node Enterprise cluster and I'm loving terrified/in awe that I'm actually installing it on a real thing, not just reading about it.

# ? Sep 4, 2012 23:31

PUBLIC TOILET: Jun 13, 2009

So I just found out from reading this ArsTechnica article that Hyper-V Server 2012 has arrived. I've downloaded it and have it running in VMware now, but I haven't played with it extensively yet. I've had a physical Hyper-V Server 2008 R2 going for a while now so I don't have a problem navigating through it. I'm curious to know if anyone else has dabbled in this yet. I haven't looked it up yet but I'm curious to know if you can access the new features of Hyper-V Server 2012 through the Hyper-V Manager in Windows 7, or if you absolutely have to be using Windows 8. Is there an updated RSAT from Microsoft for manipulating the new features from Windows 7? Has anyone actually had a chance to play with the new Hyper-V Server and its features?

# ? Sep 4, 2012 23:38

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 03:56

evil_bunnY: Apr 2, 2003

COCKMOUTH.GIF posted:

Is there an updated RSAT from Microsoft for manipulating the new features from Windows 7?

How long did it take for them to update that for 7SP1?

# ? Sep 4, 2012 23:53

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›312 »