|
CommieGIR posted:Except it's a new issue. Esxi has run fine from both devices for years, something they did changed it. Exactly, they shouldn't need high io. The tools mount/scratch run from ramdisk and logging goes to san datastore. What else does the card need to be doing? Also if the card is actually corrupted, it won't come back after a reboot, i feel bad for the hardware vendors that sent out probably thousands of replacements for cards that weren't even busted. Anyways it was fine in 7.0U1, it was U2 that broke it
|
# ? Aug 20, 2021 15:48 |
|
|
# ? May 9, 2024 00:05 |
|
GrandMaster posted:Exactly, they shouldn't need high io. The tools mount/scratch run from ramdisk and logging goes to san datastore. What else does the card need to be doing? There was always the possibility you corrupt your SD card, in fact a lot of Hyperconverged stuff either uses Mirrored SD cards or SATADOM modules to address this.
|
# ? Aug 20, 2021 16:13 |
|
SSDs are so cheap now that I'd probably just stick a couple of low end 512GB disks in RAID1 and boot vSphere off that. Used to mess around with the mirrored SD cards in Dell boxes but the write speeds got painful sometimes.
|
# ? Aug 20, 2021 16:53 |
|
I’m speccing my new VMware servers with boot SSDs now too. It gives the option to host a few test/junk VMs on local data stores and avoids the SD/USB boot issues that have been popping up. It’s also more in line with VMware’s recommendations anyways and to be honest, the cost is pretty marginal given how much the SD cards marked up anyways
|
# ? Aug 20, 2021 17:24 |
|
I use two of these in a mirror https://www.ebay.com/itm/164875884280?hash=item26635e56f8:g:V2kAAOSwDFRgppEV
|
# ? Aug 20, 2021 17:39 |
|
I recall maybe a decade ago now commissioning some new IBM HS22 blades running whatever was the latest ESXi at the time and they had an issue where their DAS would just offline rendering the host unmanageable (VMs would keep running but you couldn't manage or migrate them). Back then the fix we settled on was to just order some of the special IBM SD cards to use as the boot device on the blades instead of the DAS. Weird that it sounds like the reverse is now the issue lol.
|
# ? Aug 20, 2021 18:10 |
|
GrandMaster posted:Exactly, they shouldn't need high io. The tools mount/scratch run from ramdisk and logging goes to san datastore. What else does the card need to be doing? Beats me what's doing it, but there have been issues going all the way back to 6.0 where VMware has blamed IO on crappy SD cards: https://kb.vmware.com/s/article/2149257 My organisation's solution to this and the upcoming depreciation in 7.x has been to start moving to boot-from-san, which also means we can finally start automating building esxi hosts.
|
# ? Aug 21, 2021 00:49 |
|
Pikehead posted:Beats me what's doing it, but there have been issues going all the way back to 6.0 where VMware has blamed IO on crappy SD cards: I've given thought to that but our main data stores are NFS, so I'd be adding a whole new dependency to our environment. How's the transition worked for you?
|
# ? Aug 21, 2021 01:38 |
|
Zorak of Michigan posted:I've given thought to that but our main data stores are NFS, so I'd be adding a whole new dependency to our environment. How's the transition worked for you? Yeah, if you're NFS then that's a big issue - all our mainline storage is fibre channel, so boot-from-san involves some fiddling up front but then a fairly smooth experience. Like everything else it's ground down into "whenever someone has spare time", i.e. very slowly. It's a hard requirement for us though when we finally make it to 7.x due to all the problems inherent with SD cards (the various corruption issues that keep occuring, no persistent log location, the absolute crap that is the Cisco UCS SD controller/cards). It'll also be nice for faster booting, patching and installing - 99% of the time it doesn't matter, but when I've got to patch an entire site's worth of hosts I hate having to watch that progress bar go left to right slowly. It'll all be even better when (fingers-crossed) we can automate this stuff away using Autodeploy.
|
# ? Aug 21, 2021 08:27 |
|
Hoping someone has an idea on an issue I'm having with one of my VMs. We took a power hit yesterday that took down my rack after the UPS ran down. I'm able to get two of my VMs up in ESXi but the third is giving an error of:code:
Any ideas? I'd rather not nuke this vm since I have my monitoring system on it for my broadcast systems.
|
# ? Sep 10, 2021 00:06 |
|
diremonk posted:Hoping someone has an idea on an issue I'm having with one of my VMs. We took a power hit yesterday that took down my rack after the UPS ran down. I'm able to get two of my VMs up in ESXi but the third is giving an error of: Vdiskmanager might be the key to this: https://stackoverflow.com/questions/46193933/how-to-repair-vmx-file-corrupted-at-vmware
|
# ? Sep 10, 2021 00:23 |
|
diremonk posted:Hoping someone has an idea on an issue I'm having with one of my VMs. We took a power hit yesterday that took down my rack after the UPS ran down. I'm able to get two of my VMs up in ESXi but the third is giving an error of: Hmm, are you able to post the directory listing for the virtual machine? There's possibly a lock file there or one of the ~ files that is still around (but shouldn't be). Also potentially vmx corruption.
|
# ? Sep 10, 2021 01:12 |
|
Here's the results of a ls -a on the host. Strange that it is saying that there is no .vmx~ file. Sorry it is kind of small, remoted into my work desktop to grab the screenshot..
|
# ? Sep 10, 2021 03:17 |
|
diremonk posted:Here's the results of a ls -a on the host. Strange that it is saying that there is no .vmx~ file. Sorry it is kind of small, remoted into my work desktop to grab the screenshot.. Maybe the issue is with the permissions on the file. Check with ls -alF, not sure what the correct permissions are meant to be though.
|
# ? Sep 10, 2021 04:57 |
|
Pile Of Garbage posted:Maybe the issue is with the permissions on the file. Check with ls -alF, not sure what the correct permissions are meant to be though. Worst case scenario if the VMX is hosed is you can create a new shell VM and attach the existing VMDK to it. For safety sake I’d actually copy the VMDK to the new VM folder so you have the untouched original as a backup until you get everything working again.
|
# ? Sep 10, 2021 05:03 |
|
diremonk posted:Here's the results of a ls -a on the host. Strange that it is saying that there is no .vmx~ file. Sorry it is kind of small, remoted into my work desktop to grab the screenshot.. What's in the /var/log/vmkernel.log when you try and power it on?
|
# ? Sep 10, 2021 06:54 |
|
diremonk posted:Here's the results of a ls -a on the host. Strange that it is saying that there is no .vmx~ file. Sorry it is kind of small, remoted into my work desktop to grab the screenshot.. I might try copying or moving all the files to another directory, sounds like there is something screwy with the directory.
|
# ? Sep 10, 2021 17:21 |
|
OK, moved it to another directory and it starts booting. But then starts having issues with the file system. Maybe I should just nuke it and start over, then do a proper snapshot and backup once I have it all working.
|
# ? Sep 10, 2021 20:20 |
diremonk posted:OK, moved it to another directory and it starts booting. But then starts having issues with the file system. It sounds like it would probably take less time to do the latter at this point.
|
|
# ? Sep 10, 2021 20:27 |
|
diremonk posted:OK, moved it to another directory and it starts booting. But then starts having issues with the file system. This sounds like multiple different issues possibly related to the storage. If you're also having issues with other VMs on the same storage then you may have a problem. Otherwise I'd recommend what Arishtat suggested: Arishtat posted:Worst case scenario if the VMX is hosed is you can create a new shell VM and attach the existing VMDK to it. For safety sake I’d actually copy the VMDK to the new VM folder so you have the untouched original as a backup until you get everything working again.
|
# ? Sep 11, 2021 17:24 |
|
*Disregard. This was more specific to cloud tech than virtualization. Asked in COC.
Hughmoris fucked around with this message at 13:11 on Sep 13, 2021 |
# ? Sep 12, 2021 22:44 |
|
VMware are discontinuing support for running the hypervisor from only SD cards and USB sticks. You'll need good local disk as well, at which point the SD card adds no value. https://blogs.vmware.com/vsphere/2021/09/esxi-7-boot-media-consideration-vmware-technical-guidance.html Thanks Ants fucked around with this message at 12:39 on Oct 4, 2021 |
# ? Oct 4, 2021 12:36 |
|
Thanks Ants posted:VMware are discontinuing support for running the hypervisor from only SD cards and USB sticks. You'll need good local disk as well, at which point the SD card adds no value. I assume most servers these days come with M.2 NVMe SSD sockets so should be simple enough to use those instead whilst still avoiding the need for a SATA/SAS drive.
|
# ? Oct 4, 2021 12:49 |
|
Thanks Ants posted:VMware are discontinuing support for running the hypervisor from only SD cards and USB sticks. You'll need good local disk as well, at which point the SD card adds no value. Well the writing was on the wall, hopefully they will not start requiring stupidly big local disks to run.
|
# ? Oct 4, 2021 12:49 |
|
Gotta use that 500GB SATA drive Dell won't let you remove from the config somehow!
|
# ? Oct 4, 2021 13:34 |
Pile Of Garbage posted:I assume most servers these days come with M.2 NVMe SSD sockets so should be simple enough to use those instead whilst still avoiding the need for a SATA/SAS drive. ESXi is designed to load its OS into memory and run it entirely from there, only writing to the disks when configuration is changed and saved, so there's hardly any point in using NVMe. The marginal boot time gains aren't going to mean anything, because any company who's serious about nines will be doing live migration. SlowBloke posted:Well the writing was on the wall, hopefully they will not start requiring stupidly big local disks to run.
|
|
# ? Oct 4, 2021 13:47 |
|
BlankSystemDaemon posted:You'll be hard pressed to find server boards without any SATADOM and 2 of them is by far the norm, for the exact reason that it's the easiest way to add two SSDs in a mirror for ESXi or another type1 hypervisor to boot from. See the Dell BOSS card
|
# ? Oct 4, 2021 14:08 |
|
BlankSystemDaemon posted:You'll be hard pressed to find server boards without any SATADOM and 2 of them is by far the norm, for the exact reason that it's the easiest way to add two SSDs in a mirror for ESXi or another type1 hypervisor to boot from. Not sure if it's an issue any more but I remember back on 5.5 and maybe 6.0 if the ESXi root file system broke then it also breaks poo poo like vMotion.
|
# ? Oct 4, 2021 14:18 |
|
BlankSystemDaemon posted:You can't live migrate VMs that's using local storage, which is one of the big selling points of virtualization - so I don't think there's any need to worry about that. That's false since 5.1 tho, it just takes an assload of time if you don't have quick local disks and 10g links on the management interfaces.
|
# ? Oct 4, 2021 15:38 |
|
Can anyone help me understand something around CPU Ready and NUMA home node percentage? We have Cisco B200 blades running ESX 6.7. These have 2 x Xeon E5-2660v3 processors and 384GB. So vSphere sees each host as having 40 logical CPUs. Each blade hosts 6 x Citrix/RDS VMs. Each Citrix VM has 6 x vCPUs and 40GB memory. Each blade also has a 1 x SIEM monitoring VM that has 2 x VCPUs. Therefore the VMs on each ESX host have a total of 38 vCPUs assigned. The Citrix VMs have 1 virtual socket with 6 cores, as per the NUMA recommendations. We have Opvizor monitoring software. During busy periods we see VMs with high CPU Ready and also low NUMA home node percentages. See image below. We don't understand 1) why we see the high CPU ready even though we have not oversubscribed vCPUs to pCPUs and 2) why we see the poor NUMA performance even though we have followed the NUMA guidelines
|
# ? Oct 18, 2021 22:15 |
|
I feel like the more I learn about vNUMA the less I understand it. That being said, I generally don't factor in logical cores when doing NUMA planning. Going that route, you'd have 2x 10 physical cores to work with. I think given that constraint, you might find things running better with 4 vCPU. Now, that just may not be enough for the users on those servers. In that case, I would say it's time to move off hosts with CPUs from 2014. Definitely appreciate all the information you provided, though. I've done a lot of Citrix performance tuning over my life and I feel like I am still constantly trying to get engineers to appreciate vNUMA in their designs. It's very clear that for your end users though, they are having a rough time.
|
# ? Oct 18, 2021 23:39 |
|
I'm excited for a follow up as I've just started learning about both those concepts last week trying to get a better grasp of metrics. Which means my input isn't very valuable. With that said, the CPU Ready Average seems fine from what I've read (aim for less than 5% per vCPU). the NUMA metrics are obviously problematic; are you positive that memory is spread out equally between both processors? SIEM server hogging too much resources?
|
# ? Oct 19, 2021 02:26 |
|
I don't think 40 GB of memory assigned will cause a problem with vNUMA for those hosts. The rule of thumb that I use for myself is total memory / total processors before I start getting worried about memory. I think CPU Ready Average % already accounts for each vCPU and while I agree VMware recommends under 5%, to me I would feel much better if I was not close to 5%. For me, one of the biggest metrics I take into account when I am looking at things is Co-Stop. It can sometimes be hard to pin down the cause, but to me Co-Stop should be as close to 0 as it can be. This can be vCPU overcomit, it can be storage issues, it can be snapshot issues. I'm curious, how many users do you have on each of those Citrix hosts? Are you doing any in-guest / user session monitoring?
|
# ? Oct 19, 2021 02:51 |
|
Internet Explorer posted:I'm curious, how many users do you have on each of those Citrix hosts? Are you doing any in-guest / user session monitoring? We have an average of 6 users per Citrix VM. It's a professional services org and users will typically run several fairly intensive apps at once (e.g. Office, Teams, Acrobat, CRM, Billing). We have some in-house UX monitoring stuff that launches Outlook within the Citrix desktop and starts a timer, that seems to support the theory that the NUMA problems are having a real-world impact.
|
# ? Oct 19, 2021 13:47 |
|
I'm fairly new to this, but the last couple of weeks futzing around with Docker and Vagrant on my Debian box have been really fun and enlightening. I figured I'd poke in here and see if I'm wildly off base with my approach and to get a bit of perspective. My experience so far is a raindrop in a lake, so I figured I'd bounce this off people who were working with full buckets of water. In short, I figured I'd try to write something to deploy and provision a media server. The majority of initial setup was done from this guide: https://www.linuxserver.io/blog/2017-06-24-the-perfect-media-server-2017 ...and I figured that if I wanted to futz around further, the best thing to do would be to use a Vagrant instance. My biggest hangup has been that the host machine is configured to use mergerfs to pool drives and snapraid to handle the parity drive - and I can't for the life of me get the vagrant instance to consistently find and handle the drives. My solution so far is this: code:
The whole reason for this rigamarole is to give me a test bed for something I want to deploy on the bare host system, so the workaround for mounting drives is exclusively a Vagrant thing. It's taking up a bunch of time, and ultimately I'm not even going to use it since the goal is to just get this working outside a virtual machine. Is there a better way to approach this? I tend to get tunnel-vision in terms of solutions, so I don't want to be working towards a goal that can, at best, be "serviceable."
|
# ? Oct 22, 2021 16:33 |
|
gallop w/a boner posted:Can anyone help me understand something around CPU Ready and NUMA home node percentage? You are over subscribed (the 2660 is 10 cores per, never count hyperthreading when considering vCPU to pCPU ratios, or NUMA), just not a lot (1.8:1). I don't know if your monitoring software is accounting for it, but CPU ready as shown in esxtop is a total across the vCPU, so a 6% CPU ready for a VM would be 1% each. That said the tolerance to CPU ready and impact on performance can vary depending on the application, and in this case user experience is the guage, so if you're experiencing performance issues during these times of "high" CPU ready then it doesn't matter what the measurement is, it's too high for your environment. Having been admin for many Citrix on top of vSphere environments I tend to have better end user experience scaling out instead of up. As someone else suggested you might want to consider adding more VMs but scaling down the vCPU each to 4 vCPU. You'll get fewer users per VM, but you'll have more VMs to distribute that load, and the ESXi CPU scheduler will likely have an easier time not only keeping up, but also keeping the VMs in their NUMA bounds.
|
# ? Oct 22, 2021 21:37 |
|
So the latest vCenter 7.0 update 3 patch broke the possibility of using smb backups of the vCenter config and database. How do you even gently caress that up, VMware?
|
# ? Oct 22, 2021 23:39 |
|
battlepigeon posted:So the latest vCenter 7.0 update 3 patch broke the possibility of using smb backups of the vCenter config and database. Latest esxi patch causes purple screens when powering on thin provisioned virtual machines, so I'd say that it's just VMware being VMware - https://kb.vmware.com/s/article/86100
|
# ? Oct 23, 2021 05:15 |
|
I'll just stick to proxmox for now
|
# ? Oct 23, 2021 06:25 |
|
|
# ? May 9, 2024 00:05 |
|
Pikehead posted:Latest esxi patch causes purple screens when powering on thin provisioned virtual machines, so I'd say that it's just VMware being VMware - https://kb.vmware.com/s/article/86100 Lol how does something like that make it to production? I’m glad Nimble advises you make thick eager zero VMs so I won’t have to deal with this
|
# ? Oct 23, 2021 06:39 |