Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Watermelon Daiquiri posted:

yeah, ive seen this on the internet and a related old old qemu bug but unless it's a regression it's not it. Instead what I think it is is it has something to do with binding/rebinding some audio source for snd hda intel after the vm shuts down.

i also have a different problem entirely: my work currently has a parametric math model that we run fairly often, and it takes 5-45minutes to run depending on model size, parameters, what laptop is used etc. I figured that a good idea to explore would be something EC2 since we really need something capable of running 21 separate threads at the same time, but it seems like the only option with that many are the massive ones with a crap ton of ram, networking etc. Are there any cloud providers who can do more fine grained options like that? Another option i was thinking of is since the parameters are completely independent, i could just spin up a tiny instance for each one separately, but i dont know anywhere near enough about this to know the feasibility.

I posted in the other thread, but it would help if you could think about each independent task individually, and then use any of the many job schedulers that are designed to fit a lot of smaller tasks onto bigger nodes efficiently.

Or just use AWS Fargate, which gives you fine grained options over how much vCPU / RAM each individual task gets. You're going to get better, more cost efficient job throughput with a job scheduler, though.

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Nitrousoxide posted:

Yeah, Docker Swarm or Kubernetes (K3s or K8s) would be your main starting point for a widely used on-prem or cloud scheduler. Unless you use some companies special sauce scheduler.

If it's a really simple project you could even do something as simple as a bash script and cron job that checks a network shared directory for a folder with data needing processing and spins up a docker compose with some env variables when someone drops a new one in there.

Whats your 2c on kubernetes distributions that one might run on their own, physical hardware without wanting to pay VMWare licenses? Is K3s the winner for that? Canonical is out there marketing Mikrok8s wherever they can: https://microk8s.io/compare . Including in Ubuntu's default MOTD for a few years now.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
Yeah, it really sounds like you should look at AWS Fargate: https://aws.amazon.com/fargate

You can use it via AWS Batch or just directly, but it lets you size the compute to each task and only pay on demand without having to actually deal with a VM yourself. Go ahead and launch 21 right-sized instances at once, that way when 1 task finishes you can stop paying for the resources it's using rather than having unused cycles on a bigger instance. Also, you get to call it "Fartgate".

I take it that each of these things is single threaded, but wants a lot of RAM?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Watermelon Daiquiri posted:

nope, only like 500mb total, it's just a shittily optimized fd mesh but that fargate looks interesting. I'm just an EE who's been roped into trying to improve the run time of the models we have since I'm a computer nerd and I have so much other stuff to do, so I'll likely just say that getting everything set up will be a massive project.

Aw jeez. Have you tried running it on your laptops with GNU parallel or something? You don't need cloud for this, especially if the point of this is to get faster turnaround time for a batch. A $600 Dell desktop will have 24 threads and 32GB of RAM, letting you easily run 21 of these simultaneously. Hell, if your laptops are somewhat modern they probably have more than 12 threads so they can chew through them much faster than running sequentially.

If you don't want to deal with the full complexity of AWS and wanted a cheap hourly rental machine that you can log into and do what you need to, then destroy and stop paying for, that workload also would fit well onto a single Virtual Private Server from any of the much cheaper vendors: https://www.vultr.com/pricing/#cloud-compute/

Vultr has a 32vCPU / 64GB RAM CPU-optimized option for 95 cents per hour.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
ESXi is clearly hosed under Broadcom, if IBM hadn't simultaneously been wrecking Red Hat I'd actually be betting on Red Hat Virtualization/ oVirt picking up some market share.

KVM itself is completely rock solid and has been for ages. There's not going to be any real replacement for how big ESXi was because operating an on-prem virtualization farm will become lostech and everyone will just pay AWS more money.

Edit: if anything in the longer term I'd bet that the groups that continue to run on-premises will move to a bare-metal kubernetes solution without any hypervisor. You can run a KVM guest as a kubernetes pod if you need actual VM isolation.

Twerk from Home fucked around with this message at 23:40 on Jan 5, 2024

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
I have been skimming the Proxmox support forums, it's a mix of hobbyists that have no idea what they're doing running into predictable problems with their stupid home setups, and pros with what sounds like a reasonable, modern environment experiencing a bunch of problems with HA.

Here's an example of the latter: https://forum.proxmox.com/threads/unexpected-fencing.136345/


quote:

I have a 28 node PVE cluster running PVE 7.4-16. All nodes are Dell R640/R650 servers with 1.5 or 2TB of RAM and Intel Xeon Gold CPU's. They all have 2 x 1GB NIC's and 4 x 25 GB NIC's.

HA sounds completely busted for them. Apparently that's around the effective max size of a proxmox cluster right now before it collapses under its own weight, judging by responses.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

fresh_cheese posted:

I guess probably just general awareness, and then looking at it they have their own hypervisor but 5 minutes of googling doesnt turn up any description of what technology its based on. So thats a “hmmmmmmm….”

Then once you put it in front of the technical team they give you a list of reasons why it wont work that all come after unspoken reason 0: its not vmware so i dont want to do it.

My understanding is that Nutanix is just a much less trusted hypervisor at the core than VMWare, Hyper-V, or KVM/Qemu, which are all rock solid and proven. The various management interfaces for KVM might all suck, but KVM itself will never let you down. Nutanix was designed for doing hyper-converged VDI and just wasn't designed to be an all purpose virtualization solution. VMWare is not the only option, I bet we're going to see plenty of shops move to Hyper-V.

Look at the OP of the Homelab thread for a firsthand account into why Nutanix failed to ever get a foothold in the hobbyist market, which matters because people will try things at home before bringing them up at work:

H2SO4 posted:

In case anybody is starting out on the homelab journey and is thinking about going the Nutanix CE route, let me offer a couple words of widsom.

DON'T loving DO IT.

I originally went down the Nutanix CE route because I found a mislabled Supermicro x9 based 4-node 2U server on eBay and got it for mad cheap. It was all JBOD storage though, no RAID, so I figured it'd be a fun testbed for hyperconverged stuff like VSAN and Nutanix CE.

Nutanix CE is an absolute house of cards, you have zero access to any knowledge base or documentation unless you've also got a Nutanix account with active support through work/etc. The simplest of functions, such as replacing a failed disk drive or adding a new one, is completely undocumented and abstracted away from the interface leaving you to have to troll through god knows how many blogs and random forum posts with broken image links and missing formatting to attempt to figure out what esoteric CLI fuckery you need to do in order to make it happen.

Also, upgrades are compulsory. If they release a new version of the hypervisor/platform, the next time you login to your cluster you are forced to apply it right away. No access to your cluster/VMs/etc until the process completes, no option to defer. But that's not such a bad thing, their whole gimmick is one-click painless upgrades, right?

Wrong. Every single time I've attempted to upgrade my cluster it's resulted in needing some form of recovery. The best case scenario was a node gets hung at upgrade, evicts itself from the metadata ring, then after a manual reboot it comes back up and you can manually enable the metadata store again. This latest version caused a total cluster-wide data loss event, as when the first node updated it immediately started evicting every single disk with no warning and no loving reason. Something its own configuration should have prevented considering it knows how many drive failures/node failures/etc it can tolerate. If it had just evicted the node itself and halted the process it would have been fine, but it's not smart enough to do that.

There is at best a total lack of QA on any CE releases. For instance, the previous version's ISO installer just plain didn't work, and they were too lazy to fix it so they just removed the ISO. This led to people just using the USB image to deploy their clusters, but the USB image doesn't do any kind of repartitioning of the install drives which means that slowly but surely nodes would run out of space on the system drive making things get real fucky. If you figured out what was going on you could do some fdisk magic, write a new partition table, and extend the existing partitions but that's only if you whispered the right incantations into google and found your issue on page 32 of the results.

With the latest crash and burn I finally just said "gently caress it" and bought four 1U boxes with boring old RAID controllers, transferred the guts over, and swapped back to good old ESX and I wish I cut bait sooner. Just doing that effectively gave me back 128GB of memory that was previously wasted on the Nutanix controller VMs.

I know I'm salty, but I argue I have good reason to be considering all the headaches they caused in my lab, which by extension gave me a glimpse into how flimsy their house of cards really is. I'm sure their retail product is great. I'm sure their support is great. But based on this experience over four generations of CE, I'd actively block them from being considered for any projects at work.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

RVWinkle posted:

I think you're looking at it backwards. You have more flexibility if you run docker in a vm because you can allocate resources and manage scaling. If you install docker on the base hypervisor then it will just use as much resources as it wants.

I'm sure someone will respond with some elaborate method for managing resource allocation in docker but here's a bunch of other reasons to put it in a vm, like additional security segmentation and portability.

I'll respond and say that CPU and memory limits with cgroups via docker is incredibly easy and has been mature for ages, otherwise it'd be pretty useless to try and schedule low latency heterogenous services on clusters.

Security I'll give you, but the point of linux containers is that any modern container runtime can run them with no fuss, I don't see what portability advantages a VM gives you.

I also wish that Proxmox had first class Docker or preferably Podman instead of LXC. LXC containers seem to combine some of the bad parts of both VMs and OCI containers together.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

HalloKitty posted:

At least not where I live; all of the new guys learn ESXi and vCenter as part of their education, and can apply it immediately in the workplace. That's going to be irrelevant in future, as all the customers in the market we serve won't be in Broadcom's exclusive club.

So the future really does just look like more people using Hyper-V but everything is worse?

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Dancing Peasant posted:

My group is working to get off of VMWare (for reasons stated already). And while there has been discussion on ProxMox, management and some engineers are leaning towards OpenShift/OpenStack as another solution.

We currently have Windows and RHEL primarily, so is there reason why OS/OS isn't discussed as a viable alternative?

I'm repeating secondhand (or worse) information and general community chatter that may not be up to date or reflect reality, but OpenStack has a reputation for being a pain box. It's difficult to deploy well, the integration of different components of it sometimes feel like they're not even part of the same over-arching product vision, and everyone's OpenStack ends up being a unique beast, making ongoing operations of it a pain. A good number of operators feel trapped on it, and I would bet that for a few years now it has not been a common choice for groups setting up a new environment.

OpenShift seems much healthier by comparison, but that's k8s and not a complete replacement for clustered virtualization.

Twerk from Home fucked around with this message at 19:13 on Mar 4, 2024

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

tokin opposition posted:

tell me to piss off if this isn't the right place for this, but if I wanted to get a job that does virtualization*, what kind of skills or homelab projects would be good for putting on a resume or talking about in an interview?

* trying to escape helldesk ASAP

Tokin I don't know your exact skillset or where specifically you are wanting to go, but here's what I'd learn if I were in helldesk, Good At Computer, and wanted to get a better job ASAP:

  1. Install ESXi free on one of your computers. I know VMWare just cancelled the free one, but figure out a way.
  2. Make a lovely Wordpress host VM on there. You can do it lazy with the MySQL database on the same VM, or more realistically with a separate DB VM.
  3. Update the Wordpress default homepage and make it a photo of your cat or something so that you have some data in it.
  4. Make an AWS account and move that VM into AWS. Use EC2 T3 instances or similar to start, don't use Lightsail, Lightsail is EZ-mode.

If you want to learn more or do it better, I'd also think about moving the database into a separate managed RDS instance like db.t4g.micro, in any real organization you'd be doing infrastructure as code so instead of creating resources in the AWS console you'd be using something like the CDK or Pulumi if you are lucky or Terraform if you are not. You could also move VMs that are doing something more complicated than Wordpress, but I don't know what languages or techs you're comfortable with. You could build a tiny thing in Django or Laravel or Spring Boot that you then move into AWS, ideally with more outside parts like Redis for caching or a message queue.

If you really want to focus on on-premises infrastructure I'd replace the second half of the list with moving things into containers, then kubernetes pods. Keep the database on a separate VM outside of k8s, and if you've got 2 PCs to do this on have ESXi on one and move applications into k8s on VMs on Proxmox on the other one or something.

Edit: in the longer term I don't think Proxmox is going to be a winner. I have no idea what will, but k8s or something substantially similar to it sure is around to stay.

Also, I completely omitted all the networking/firewall/IAM stuff that would be the most complex parts of all this, figure out how to limit access to just your home IP and just your user through some identity platform. AWS Cognito maybe? And if you talk about this in an interview say "lift and shift multi-tier application" not "WordPress migration".

Twerk from Home fucked around with this message at 15:05 on Mar 15, 2024

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

For it to Just Work and be something that you'd use? Sure. To learn skills that you're hoping make you immediately employable? AWS Cognito https://aws.amazon.com/cognito/ and OIDC federation with an outside identity provider.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply