|
Here's a (not so) small excerpt from my vmkernel.log. Definitely seems like there's something going on. Still working on interpreting it. http://pastebin.com/5910mKX2
|
# ? Aug 14, 2014 16:55 |
|
|
# ? May 9, 2024 16:00 |
|
Martytoof posted:Here's a (not so) small excerpt from my vmkernel.log. Definitely seems like there's something going on. Still working on interpreting it. What SAN/NAS do you have and are you using VAAI?
|
# ? Aug 14, 2014 17:50 |
|
Machine is a DL380 G7 with an P410i directly connected to 8 SSDs on the chassis backplane. No NAS/SAN in place.
|
# ? Aug 14, 2014 18:18 |
|
Martytoof posted:Here's a (not so) small excerpt from my vmkernel.log. Definitely seems like there's something going on. Still working on interpreting it. All of the errors in there are related to the SCSI Log Sense, ATA Passthrough, and mode sense commands, and not to any actual data manipulation. Those would indicate to me that your raid controller does not support those SCSI commands or some portion of them as the sense data from the target indicates that it considers the Op code or one of the fields in the CBD to be invalid. This could indicate a driver issue with the raid controller, or problem with the way the device is configured to use SATP. What does the "esxcli storage nmp device list -d" command show for that device?
|
# ? Aug 14, 2014 18:22 |
|
Server is unavailable to me at the moment but I will certainly check once it's back in my hands (infrastructure vendor is looking at it right now); Thanks for taking a glancing look.
|
# ? Aug 14, 2014 18:28 |
|
This may be worth a look as well: http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=1030265
|
# ? Aug 14, 2014 18:38 |
|
Martytoof posted:Here's a (not so) small excerpt from my vmkernel.log. Definitely seems like there's something going on. Still working on interpreting it. You have really high latency too. Which usually indicates an array issue. Too much IO or underpowered.
|
# ? Aug 14, 2014 21:29 |
|
Hmmmm, Horizon View 6.0 was release almost two months ago and I just noticed today. Anyone running this in production yet?
|
# ? Aug 14, 2014 22:06 |
|
I'm fairly confident that the P410i is choking on the IO that you're throwing at it. Entry-level embedded RAID that would be fine with HDDs falls over badly at SSD IO rates. Consider using straight SAS controllers and add RAID in the VM (if possible) or add more controllers.
|
# ? Aug 14, 2014 22:21 |
|
Moey posted:Hmmmm, Horizon View 6.0 was release almost two months ago and I just noticed today. What do you want to know PCjr sidecar posted:I'm fairly confident that the P410i is choking on the IO that you're throwing at it. Entry-level embedded RAID that would be fine with HDDs falls over badly at SSD IO rates. Consider using straight SAS controllers and add RAID in the VM (if possible) or add more controllers. This so much the p410i is not know for its performance it's a very budget raid controller. The ssds will not be the bottle neck neither is the hypervisor, the card just is just not meant for that kind of io. Dilbert As FUCK fucked around with this message at 22:41 on Aug 14, 2014 |
# ? Aug 14, 2014 22:38 |
|
Dilbert As gently caress posted:What do you want to know Nothing specific, any new features/performance benefits that anyone found useful I finally got us upgraded to 5.3 less than 6 months ago, time to start planning this upgrade...
|
# ? Aug 14, 2014 22:56 |
|
PCjr sidecar posted:I'm fairly confident that the P410i is choking on the IO that you're throwing at it. Entry-level embedded RAID that would be fine with HDDs falls over badly at SSD IO rates. Consider using straight SAS controllers and add RAID in the VM (if possible) or add more controllers. This is possible, but without knowing how many IOPs it's dealing with you can't say. If it's really only 600 then it's very likely not simply the card bottle-necking, particularly on a Raid 10. There's also the fact that he said that write latencies are lower than read latencies, which is the opposite of what you would expect generally, unless the card has battery backed write cache, which is an option. I'd like to see some read and write IOPS and latency metrics before making any judgements.
|
# ? Aug 14, 2014 23:18 |
|
Moey posted:Nothing specific, any new features/performance benefits that anyone found useful I finally got us upgraded to 5.3 less than 6 months ago, time to start planning this upgrade... You can do RDS hosted apps and desktops. That's a pretty big addition.
|
# ? Aug 14, 2014 23:42 |
|
I am going to respectfully disagree about the P410i in this instance. This server was running flawlessly for nearly 45 days, and then on 8/8 we started seeing the DAVG issues. Sorry I wasn't terribly clear about the history. I do understand that the P410i might not be the IDEAL controller (and I am actually addressing that with our vendor), but it certainly didn't cause us this grief until very recently.
|
# ? Aug 15, 2014 01:03 |
|
Martytoof posted:I am going to respectfully disagree about the P410i in this instance. This server was running flawlessly for nearly 45 days, and then on 8/8 we started seeing the DAVG issues. Sorry I wasn't terribly clear about the history. I do understand that the P410i might not be the IDEAL controller (and I am actually addressing that with our vendor), but it certainly didn't cause us this grief until very recently.
|
# ? Aug 15, 2014 01:22 |
|
adorai posted:The worst part about storage IO contention is that is manifests slowly at first, then suddenly and all at once. That's possible, but I'm seeing contention without our SIEM app running, basically just a bare RHEL 6.4 server. At any rate, we're going to involve VMware at this point, if anyone is interested I'll update the thread once they track it down and duke it out with HP support.
|
# ? Aug 15, 2014 02:04 |
|
Haha if you think you're gonna get anything more from VMware other than "this is a storage issue talk to them"
|
# ? Aug 15, 2014 02:07 |
|
Nitr0 posted:Haha if you think you're gonna get anything more from VMware other than "this is a storage issue talk to them" That's exactly what I'm expecting, mainly because it's exactly what I got from HP about the hardware, but our vendor is putting HP and VMware on a bridge call so I'll let them duke it out I 100% believe this is going to be as painful as I'm envisioning. I also entirely believe it's hardware based, but good luck trying to convince HP support of that. some kinda jackal fucked around with this message at 02:16 on Aug 15, 2014 |
# ? Aug 15, 2014 02:14 |
|
Martytoof posted:That's possible, but I'm seeing contention without our SIEM app running, basically just a bare RHEL 6.4 server. If the contention doesn't change if you do different work loads definitely a HW issue, have you done any bare metal OS tests?
|
# ? Aug 15, 2014 02:40 |
|
Martytoof posted:That's exactly what I'm expecting, mainly because it's exactly what I got from HP about the hardware, but our vendor is putting HP and VMware on a bridge call so I'll let them duke it out What brand/model SSD? What percentage is in use? in a well actually fucked around with this message at 03:05 on Aug 15, 2014 |
# ? Aug 15, 2014 02:53 |
|
Moey posted:Nothing specific, any new features/performance benefits that anyone found useful I finally got us upgraded to 5.3 less than 6 months ago, time to start planning this upgrade... PCoIP improved over 5.2 Windows 7 Media Redirection Like three said, that is cool Better HTML5, and more support ~1000 I think, for client less experience vSAN, 2012 R2 support Workspace improvements Fixed* space reclimation on windows 8 You do lose local mode so watch out. Other than that View is still way more awesome and simpler than citrix; and I am perfectly fine with being a fanboy of view.
|
# ? Aug 15, 2014 03:41 |
|
Dilbert As gently caress posted:Other than that View is still way more awesome and simpler than citrix; and I am perfectly fine with being a fanboy of view.
|
# ? Aug 15, 2014 04:11 |
|
adorai posted:I'm really not sure how you can get any easier than citrix xendesktop. We had a guy on our team who we got via acquisition. Prior to us he used view, and after citrix xendesktop. He said everything about the citrix solution was better. Dunno View is pretty clean 1. Deploy 2 Windows Servers 2. Do some very basic SQL stuff for composer, event log, and what not. 3. double click .exe files and select type of server 4. control everything from a centralized interface 5. import vm you customized for deployment Some gripes: Citrix without VPN requires a netscaler, why the gently caress does it? View has the Security server in the same 250mb .exe of connection or other servers; why can't citrix? View seems to work great without any client side agent or plugin install for HTML5 uses, citrix still wants some client side thing for USB/Sound/features VMware offers. The gently caress do I have to go to director(web page) and then an Studio (MMC) for to manage poo poo in? View does it all in one. Then again I might just be being bitchy. It also could be we got a botched upgrade to 7.1, and not 7.5 like promised, so that's why this weekend I am just going to draw up a whole new migration pattern because it annoys me. I just like the stupid setup of View, gently caress is it simple.
|
# ? Aug 15, 2014 04:21 |
|
Is there a firm definition of what exactly is and isn't VDI? I've been telling my boss that VDI specifically is something where desktops are virtual and created when a user logs on and destroyed when they log off. My boss says it's just anything where you connect to a desktop that is hosted from a central server/cluster. I hate it when my boss is right, but maybe he's right on this one?
|
# ? Aug 15, 2014 07:01 |
|
He's right
|
# ? Aug 15, 2014 07:04 |
|
FISHMANPET posted:Is there a firm definition of what exactly is and isn't VDI? VDI is where desktops are held virtually. Refresh a logoff is only a feature of VDI; nothing more. Like Nitr0 said he is right. What are you wanting to acomplish?
|
# ? Aug 15, 2014 07:10 |
|
Nitr0 posted:He's right
|
# ? Aug 15, 2014 14:56 |
|
adorai posted:Not only is he right, but fishmanpet is dead wrong. In some cases it is preferable to not destroy the desktop upon user logoff, and instead allow that user to have a persistent virtual desktop. Or in some cases everyone at your job is an idiot and every person has a persistent desktop...
|
# ? Aug 15, 2014 14:57 |
|
I hate it when he's right. Anyway, we're rolling out a Linux Desktop thing that is basically like a Terminal Server but for Linux desktops, and he wants to advertise it as VDI. I have (incorrectly, it seems) argued that it's not VDI so we shouldn't advertise it as such. I still don't think we should advertise it as such, because it doesn't mean anything to a normal person and doesn't inform them as to the capabilities of the service we're providing.
|
# ? Aug 15, 2014 17:09 |
|
FISHMANPET posted:I hate it when he's right.
|
# ? Aug 15, 2014 17:13 |
|
The second one.
|
# ? Aug 15, 2014 17:18 |
|
FISHMANPET posted:The second one.
|
# ? Aug 15, 2014 17:23 |
|
I feel as though I passed some rite of passage today, had my first Purple Screen of Death in a prod environment. 3 host cluster, I've been adamant about making sure we aren't using any more than 2/3rds of our capacity so the incident didn't really cause any major issues other than "Why the hell was I just kicked out of this remote desktop session? Oh, the server rebooted." Logged into the iDrac and bounced the host after taking a screenshot of the error screen. Running 5.1 right now. Looks as though it's E1000 related as the first two lines are E1000PollRxRing and E1000DevRX though, further down I see Vmxnet3VMKDev_TQDoTX and Vmxnet3VMKDev_Async as well. So, I'm not completely sure what to think. We were doing a P2V on to that host (converting an old linux box) at the time and there were two machines with the E1000 adapter on that host as well. One of them SHOULDN'T have been E1000, it was an oversight when setting it up. One of them had to be E1000 because it's a virtual appliance and we can't change it. I get the feeling that I'm not going to fully figure out what caused this and the only guidance I can see from VMWare is "don't use E1000." Hosts have been otherwise solid for the last 18 months (all Dell R620s with Intel nics.)
|
# ? Aug 15, 2014 23:37 |
|
bull3964 posted:I get the feeling that I'm not going to fully figure out what caused this and the only guidance I can see from VMWare is "don't use E1000." Hosts have been otherwise solid for the last 18 months (all Dell R620s with Intel nics.) VMware can't really just tell you to not use a feature that they support. Unless something has changed recently E1000 vmNICs are still fully supported by VMware.
|
# ? Aug 16, 2014 05:40 |
|
cheese-cube posted:VMware can't really just tell you to not use a feature that they support. Unless something has changed recently E1000 vmNICs are still fully supported by VMware. Yet every single knowledge base article they have around the subject resembling this issue basically says "use them as little as possible if at all" as the resolution to the problem. The only other workaround is to disable RSS on the guest, but that isn't always possible if it's an appliance.
|
# ? Aug 16, 2014 06:32 |
|
bull3964 posted:Yet every single knowledge base article they have around the subject resembling this issue basically says "use them as little as possible if at all" as the resolution to the problem. Basically because non-accelerated NICs (e1000, rt8139, etc) clog up the system looking for interrupts and are less performant on guests. Not because they cause panics and break functionality. They shouldn't. File bugs if so
|
# ? Aug 16, 2014 15:39 |
|
evol262 posted:Basically because non-accelerated NICs (e1000, rt8139, etc) clog up the system looking for interrupts and are less performant on guests. Not because they cause panics and break functionality. They shouldn't. File bugs if so Uhhhhh http://kb.vmware.com/selfservice/microsites/search.do?language=en_US&cmd=displayKC&externalId=2059053
|
# ? Aug 16, 2014 18:32 |
|
Nitr0 posted:Uhhhhh VMware posted:This is a known issue affecting ESXi 5.x hosts and virtual machines using the E1000 and E1000e virtual network adapters. For more information about available network interface types, see Choosing a network adapter for your virtual machine (1001805). "This issue is resolved in:" This is exactly how bug errata is worded. It's a bug, and it got fixed, probably because somebody reported issues. Your point is what?
|
# ? Aug 16, 2014 19:26 |
|
cheese-cube posted:VMware can't really just tell you to not use a feature that they support. Unless something has changed recently E1000 vmNICs are still fully supported by VMware. I don't know if you can call them has "fully supported". I tried to argue this very point on a case 4 months ago. We had issues with View clients not able to connect to their pools. Support refuse to continue troubleshooting as the vCenter for the environment had an e1000 nic. The tech tried to claim since best practice was vmxnet3 he shouldn't have to trouble shoot anything but that and clearly that was our problem.I literally had to go through a week+ of change management busy work before they would even spend more time on the issue. Of course it wasn't even related to the problem and just ended up being a waste of time. Support in the sense that someone has committed to make it work by a policy, in practice... I wouldn't waste your time with the tech support lines.
|
# ? Aug 17, 2014 03:49 |
|
|
# ? May 9, 2024 16:00 |
|
evol262 posted:"This is a known issue affecting..." They have a few more bugs listed with the same symptoms and they aren't listed as resolved in any version.
|
# ? Aug 17, 2014 04:27 |