Virtualization Megathread V2: VMs inside VMs

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›6 »

Potato Salad: Oct 23, 2014; nobody cares

Crosspost from Pissing Me Off: Myself

We have some seminar / training room VDI pools in View and some temp employee / home access VDI pools.

To handle the fact that we want user desktop / document folder redirection to our storage filer on the employee pools and not the training pools, we deployed the AD OU for our VDI computer accounts with policy loopback enabled. There is a very obvious edge case I didn't account for that has certain temp employees not on redirected folders. How to much more elegantly lay out the OU for our VDI machines is obvious now, but drat it I need to apply this laaaaaaaaaaate after hours.

# ¿ Mar 18, 2016 14:52

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 19:40

Potato Salad: Oct 23, 2014; nobody cares

evol262 posted:

I'm sure the new SAN will help when the storage admins don't bother to configure multiple pathing, because bonds are better somehow.

Stop talking about the inadequacies of my storage infrastructure :smith:

evol262 posted:

pNFS

I'm a grown adult. Why am I giggling uncontrollably when I read this aloud.

# ¿ Apr 7, 2016 16:57

Potato Salad: Oct 23, 2014; nobody cares

Docjowles posted:

God dammit, I am never going to be able to read/say this as anything but "peen FS" from now on. Thanks a lot, buddy

I was doing "penifs" but now I can't even use the acronym. Thanks, right back at ya :argh:

# ¿ Apr 7, 2016 17:32

Potato Salad: Oct 23, 2014; nobody cares

Absolute Zero as an actual need has a massive budget behind it for new talent and consultants.

You do have a massive budget, right? :corsair:

# ¿ Apr 28, 2016 21:37

Potato Salad: Oct 23, 2014; nobody cares

devmd01 posted:

. What is my most effective method of migrating guests across the mpls, since we are also re-IPing everything anyways. Weekend downtime is acceptable, most guest storage luns are 2TB.

What would you use and why?

How far, physically?

Last time I moved a datacenter, I rented a cheap-o turnkey NAS, had poo poo slowly duplicate for a few days ahead of time, and drove it fifty miles. Had I wrecked, no production stuff would have been harmed -- it was a well-insured rental device. Beat the hell out of using their lovely ISP.

# ¿ Apr 29, 2016 14:02

Potato Salad: Oct 23, 2014; nobody cares

Yes. I do this at home. It's just a datastore.

Edit - here's a guide that looks accurate after a quick read-over. http://www.virten.net/2015/10/usb-devices-as-vmfs-datastore-in-vsphere-esxi-6-0/

Potato Salad fucked around with this message at 04:29 on Jun 1, 2016

# ¿ Jun 1, 2016 04:26

Potato Salad: Oct 23, 2014; nobody cares

Dedupe does very little for SOHO from a cost benefit perspective.

# ¿ Jun 13, 2016 04:14

Potato Salad: Oct 23, 2014; nobody cares

I thought just by subject matter that I was in the enterprise storage thread. Apologies.

Dude with the money to burn: your system plan is unbalanced. Sell the second 1080, get a 950 PRO or other NVMe storage device and a turnkey SSD-backed SOHO NAS and enjoy the fastest storage you can buy without going always-on ramdisk.

# ¿ Jun 13, 2016 04:19

Potato Salad: Oct 23, 2014; nobody cares

There was a post here, but it was needlessly confrontational. What people do with their money is none of my business.

Potato Salad fucked around with this message at 13:34 on Jun 13, 2016

# ¿ Jun 13, 2016 13:32

Potato Salad: Oct 23, 2014; nobody cares

Look up "VMware EVC Mode." You can tell VMware what generation of CPU features to present to guests of a virtual datacenter, allowing you to mix them to the oldest common cpu generation.

# ¿ Jun 18, 2016 04:10

Potato Salad: Oct 23, 2014; nobody cares

Definitely don't overspec VMs for class labs. That might give customers a good impression!

# ¿ Jun 21, 2016 13:11

Potato Salad: Oct 23, 2014; nobody cares

Your IT department is retarded if they can't give you a slice of their storage pie. Lack of credentials on the single log generating machine is a bullshit excuse. There may a handful of options at numerous layers of the network to make it happen.

# ¿ Jul 3, 2016 12:52

Potato Salad: Oct 23, 2014; nobody cares

Keyed ssh access, fw on a throwaway vm frontend of the storage pool open to that one server, there are so many ways to provide that storage slice with logical isolation from the rest of your storage in ways compliant with 800-53 or -171 for your DoD work.

# ¿ Jul 4, 2016 07:02

Potato Salad: Oct 23, 2014; nobody cares

Nope, you can use anything.

# ¿ Jul 26, 2016 21:24

Potato Salad: Oct 23, 2014; nobody cares

DevNull posted:

Never buy an AMD

Whatchu talking about, Zen will implement generations of generations of features all in one go without any disappoi-- :suicide101:

# ¿ Aug 8, 2016 16:22

Potato Salad: Oct 23, 2014; nobody cares

I'm in the same place with an Oracle E-Biz system with users who (1) can't let me discard decades-old data to a non-production DB (2) can't write reports that exhibit any level of mindfulness for efficiency (3) won't put these reports in our digital document / OCR system and instead re-queue and re-print them every time someone needs to see <report result>.

Someone, please for the love of God, make an affordable turnkey SSD storage filer. You are going to have to eventually.

# ¿ Aug 12, 2016 13:57

Potato Salad: Oct 23, 2014; nobody cares

Trading?

# ¿ Aug 18, 2016 01:17

Potato Salad: Oct 23, 2014; nobody cares

High frequency trading is absolutely critically dependent on timing.

# ¿ Aug 18, 2016 01:45

Potato Salad: Oct 23, 2014; nobody cares

I once couldn't get a guy on Spiceworks to understand why, while analysts and portfolio managers can work in a remote office, trading systems themselves needed to be near the exchange itself as a bar of entry to HF trading. His management was riding him for not providing an uplink to the exchange that worked faster than the speed of light would permit for the remote office they wanted to move their trading infrastructure to :psylon:

What a company was doing with some fool who couldn't see that, hey, the speed of light was his limit....there are high-6-figure-salaried experts on this kind of network architecture for a reason.

# ¿ Aug 18, 2016 02:29

Potato Salad: Oct 23, 2014; nobody cares

Before new exchanges implemented rules (or was it the SEC...can't remember) on placing enough fiber in your uplink to impose a minimum latency to the exchange, traders were successful only when they had literally the closest office.

# ¿ Aug 18, 2016 02:31

Potato Salad: Oct 23, 2014; nobody cares

mayodreams posted:

Two days later I was fired on day 90 of my 90 day probationary period because 'we are professionals, so we don't need checklists' and 'documentation is pointless because it is out of date as soon as it is written'.

And that ended the worst experience of my professional career.

> professionals
> documentation is pointless

:bahgawd:

# ¿ Aug 18, 2016 04:40

Potato Salad: Oct 23, 2014; nobody cares

Specs for the hosts? What storage is this running on?

# ¿ Aug 18, 2016 21:20

Potato Salad: Oct 23, 2014; nobody cares

If you have the cash, buy enough 2TB Samsung 850 PRO SSDs to handle your storage issues. 100vms on magnetic disk? What are the VM OSes? I can't imagine trying to run that kind of demand on a single spinning disk. This is coming off as incredibly seat-of-pants hobo dev.

# ¿ Aug 18, 2016 23:41

Potato Salad: Oct 23, 2014; nobody cares

The two vmdk errors you're seeing happen when I try to do too much with lovely storage and elements of the storage stack start timing out or silently and gracelessly crashing. 100vms per single 4tb archival spinning hard drive is way off the edge of the map.

# ¿ Aug 18, 2016 23:44

Potato Salad: Oct 23, 2014; nobody cares

There's a guy in another thread talking about some developers being, to be frank, uneducated but argumentative with their expectations for a barebones esxi setup and I can't help but wonder.

Maybe the support ticket with vmware will find some bug or misconfiguration (that's your fault if you were the guys who did the updates then rollbacks), but I can already kinda feel the sigh of the guy who jumps on the phone with your IT team tomorrow hearing that, yep, its another shop with devs who think virtualization is magic and haven't sat down and worked out what 100vms booting simultaneously on a single high-capacity drive or small array of drives on a cheap-rear end controller means just in terms of seek time alone.

"But it worked before!"

Potato Salad fucked around with this message at 23:55 on Aug 18, 2016

# ¿ Aug 18, 2016 23:49

Potato Salad: Oct 23, 2014; nobody cares

I'm coming off harsher than I mean to. Boot storms are not inconveniences, they are problems.

# ¿ Aug 18, 2016 23:59

Potato Salad: Oct 23, 2014; nobody cares

Winkle-Daddy posted:

And why you think we're running on a single spinning disk, I don't know; it's a RAID array.

Do you happen to have the model of the raid controller? Budget and even medium-range servers tend to ship with the lowest or next-to-lowest controller available for that generation unless you ask for something better during purchasing. The H710 that most (in my experience) sales reps include in quotes for gen 8 poweredge systems is weaker in a parity array than some high-end enterprise single disks in the real world. Additionally, just because this is a raid controller doesn't mean that seek time is eliminated as an issue. Striping can increase straight sequential reads, sure, but OS booting rarely is sequential. It's random as hell, and striping really doesn't help you when a disk still has to move from sector to sector physically. What striping/mirroring scheme are you using, and what's the storage system on your guests so we can figure out influences like dedupe. Is the raid controller in write through or write back mode? Storage is complex.

What are the OSes of the guests, and what build of esxi 5.5 did you roll back to? Same as before? If I am coming off like a tier 1 call center guy right now, it's because you may not have had the personal experience of seeing how deep the virt storage stack rabbit hole can go. Details are going to be important when you're maxing the system out for hours on end and expecting the storage stack not to collapse and start crashing.

Potato Salad fucked around with this message at 14:26 on Aug 19, 2016

# ¿ Aug 19, 2016 14:05

Potato Salad: Oct 23, 2014; nobody cares

The bit about "it's not working like it was before" is invalid in a high-demand situation with respect to the capabilities of the underlying hardware. Are there pending operations that the OSes of the guests are trying to run at boot, for example? Is the guest trying to run any number of health checks on boot as they all crashed on the upgraded hypervisors before your rollback?

The system as it is right now is not congruent with the way it was before. If you were using an out-of-the-box esxi5 image before and reverted back to that image, then what's the thing still around right now that hasn't reverted? Perhaps configuration of the hypervisors, but also perhaps the state of your guests.

edit: aaaaand I was a Tier 1 guy at the beginning of my career for a high performance computing environment for climate modeling researchers, so I am aware of the friction that can take place between a developer and the hardware architects in a situation where the hardware is being intentionally pressed to the limit. Every single detail and element in the storage stack is absolutely essential to understand and draw out to reveal all the moving (proverbially) parts. Spending a few thousand bucks on enterprise-grade SSDs and removing parity calculations from your raid controller's workload may save you a lot of trouble in the long run if upgrades and changes are things you like to do this environment and you want those to go more smoothly. A Samsung 850 PRO at 1TB will last six years if written from empty to *full* every single day with a workload incurring a write amplification factor of three. If you mostly read with these devices and write less than 80B per day on average (which is more than likely if your disks are only 40GB in size), a $300 1TB 850 EVO would carry you through to its 5 year warranty and probably well beyond. Any chance you or your IT team have metrics on steady-state I/O of your testing and boot I/O? If not, something like PRTG is free for 100 sensors (more than enough to monitor two esxi hypervisors)

That is, assuming it's not just a simple hurf-durf configuration error someone made somewhere

Potato Salad fucked around with this message at 15:15 on Aug 19, 2016

# ¿ Aug 19, 2016 14:12

Potato Salad: Oct 23, 2014; nobody cares

Just mulling off the top of my head right now on what I'd do as your IT guy, I'd probably sequentially boot up, let run for a bit, then shut down the VMs in small batches and watch to see whether or not their io is atypical. Paravirtualized devices could have changed on upgrade and the OSes needed a clean reboot after updating opcode or the drivers for your underlying hardware changed with the upgrade and would also require an OS reboot or at least result in a longer reboot depending on your guest OS. There's a lot that goes on when you upgrade esxi generations.

Perhaps all the guests need is a clean reboot in small batches and then they'll be good to go as before.

edit: consider, for example, what guest you have and whether switching or changing the HAL is something that'll happen on boot for it after the upgrade. It may be the case that this particular bootstorm that is timing out your VMs is a particularly nasty one that needs hand holding https://en.wikipedia.org/wiki/Hardware_abstraction#In_Operating_Systems

Potato Salad fucked around with this message at 14:32 on Aug 19, 2016

# ¿ Aug 19, 2016 14:24

Potato Salad: Oct 23, 2014; nobody cares

As your IT guy, I'd probably also get a quote for a few $300 1TB 850 EVO SSDs and suggest that the endurance of these TLC drives may well be suited for your test environment if you're writing less than 80GB/day to each one each day on average (which would be 560 GB per week per ssd, or 56GB per VM per weekly test if that's anywhere close to your use case).

I also need to apologize for what has been a departure from the normally-chill tone of this thread.

Potato Salad fucked around with this message at 15:23 on Aug 19, 2016

# ¿ Aug 19, 2016 15:06

Potato Salad: Oct 23, 2014; nobody cares

Who the gently caress VAR tried to sell MS licenses for nix systems?

# ¿ Aug 31, 2016 15:44

Potato Salad: Oct 23, 2014; nobody cares

VMworld is meh. Woohoo, html5 next generation, yay everyone loves the host web client.

Still waiting for 'em to price their poo poo more competitively.

# ¿ Aug 31, 2016 20:10

Potato Salad: Oct 23, 2014; nobody cares

I think containerization is the thing for you if you're running a bunch of scripts and homebrew apps with a short memory budget. The memory overhead of a little Ubuntu Server VM may be relatively small, but if you're running 6--8 apps on that machine, you'll get into the red kinda quick. You also seem to be maxing memory out sometimes, particularly in buffering. I'm guessing the zpool is backed by spinning disks? Not knowing what you're running on the machine, I'd have to guess that the buffering is write activity on your smb / zpool setup.

Is the write performance of your zpool ever a problem? A roommate uses zfs / Freenas to serve media and all of our centralized backups in addition to a pair of jails. It's a 16GB machine, and when we're doing big writes, our monitoring VM will see the storage stack gobble up huge amounts of ram.

I say this because you may notice an occasional performance degradation on your smb/zpool storage service if my assumption about what's occasionally eating your ram is correct. This would make precisely divvying-up your memory to guest VMs a rather crucial task. Got any monitoring on the memory consumption of each of your services?

# ¿ Sep 6, 2016 18:20

Potato Salad: Oct 23, 2014; nobody cares

"Frank, dude, Frank, have you seen the new hard drives? Holy poo poo one WDRed can hold 16gb"

"poo poo, Sam, pair of those in raid and she's good to go."

# ¿ Sep 7, 2016 00:24

Potato Salad: Oct 23, 2014; nobody cares

That's an awful lot of energy draw though. On the flip side are single-socket boards with processors like the E3-1265L V3 running only 40-ish watts idle (I may or may not be partial to my supermicro uATX + 1265L v3 VMware lab).

# ¿ Sep 9, 2016 17:38

Potato Salad: Oct 23, 2014; nobody cares

drat your math! :bahgawd:

# ¿ Sep 9, 2016 17:59

Potato Salad: Oct 23, 2014; nobody cares

Just wait until you try to configure hardware passthrough in the web console :unsmigghh:

Or manage serial devices.

# ¿ Sep 9, 2016 21:33

Potato Salad: Oct 23, 2014; nobody cares

It's fairly clear uncommon functions have not been fully implemented and tested yet.

I'd rather it was in this state than not out at all, though. Works fine for day to day management.

# ¿ Sep 10, 2016 12:36

Potato Salad: Oct 23, 2014; nobody cares

Nvidia GTX series cards try to avoid virt. You have to either pull serious trickery or buy a Quadro.

What about eGPU?

External Thundabolt 3 gpu enclosure + passhrough......? Forget use case; would anything prevent use of a, say, GTX 1060 by TB3 on a Windows vm?

# ¿ Sep 29, 2016 03:24

Adbot: ADBOT LOVES YOU

# ¿ May 9, 2024 19:40

Potato Salad: Oct 23, 2014; nobody cares

My Google-fu fails me at the moment, but I'll keep trying:

Does vSAN All-Flash require two distinct capacity and caching tiers, or can you get away with just a capacity tier?

# ¿ Sep 29, 2016 18:17

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > Virtualization Megathread V2: VMs inside VMs

«‹›6 »