Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
The Fool
Oct 16, 2003


I wasn't trying to imply that it was easy or even done well by most orgs

I was trying to make two points
1. any org with a minimal level of operational maturity is going to have plans for those scenarios

and

2. the idea that any given service is going to be immediately unavailable through anything other than a natural disaster is almost laughably 99% of the time things will be left on while the bankers pick the bones


fe: inb4 someone comes in with personal experience of the 1%

Adbot
ADBOT LOVES YOU

xzzy
Mar 5, 2009

LochNessMonster posted:

Not sure if you’re replying to me because of the ‘caught slacking’ comment. I completely agree some workloads are still (and maybe always) better suited on prem.

Not specifically, no so it's all cool. Just filling up this forum with more posts.

Hadlock
Nov 9, 2004

Business continuity plan is generally "given an act of God, earthquake in downtown San Francisco levels everything, how would we return to functionality within 14 days. Here is where we plan to keep our off-site backups, and these are the broad strokes for how we would recover and rebuild the it and software services for the company"

If you use something like big query the business continuity plan might be something line, "it would be impossible for the business to function if Google dropped us as a customer"

The Fool
Oct 16, 2003


yeah, but thats what you get for hiring former google engineers

xzzy
Mar 5, 2009

Hadlock posted:

"it would be impossible for the business to function if Google dropped us as a customer"

What do you mean, it happens every 18 months when their internal slap fights go public and they kill another product.

Jabor
Jul 16, 2010

#1 Loser at SpaceChem
Your "plan" could very well be "it would take us three quarters to rewrite our application on a new provider, plus one quarter to finish a migration and tail off use of the old one". It's a lovely plan but it's a plan.

Having that plan allows you to weigh the risks of that happening, the consequences to you if it does happen, and what levels of extortionate pricing you're willing to soak just because it's still cheaper than a migration.

If you don't have a plan, you don't know how bad it's going to be, and you're not able to accurately weigh the costs and benefits of further investing yourself with that supplier. If you look at that plan and see that the business can't survive it, well, that's something that's good to realize before it actually happens, isn't it?

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


There are plans but there's no real reason to plan for something like a major vendor suddenly going kaput because it's extremely rare and if it does happen you are basically hosed anyway.

The only real planning that takes place is stuff like what happens if the datacenter on the East Coast gets flooded. That's a real disaster recovery.

Falcon2001
Oct 10, 2004

Eat your hamburgers, Apollo.
Pillbug

Gucci Loafers posted:

There are plans but there's no real reason to plan for something like a major vendor suddenly going kaput because it's extremely rare and if it does happen you are basically hosed anyway.

The only real planning that takes place is stuff like what happens if the datacenter on the East Coast gets flooded. That's a real disaster recovery.

FWIW: a lot of risk assessment plans have points where they go 'welp, this is outside of scope' - like for example, when I did this at a FAANG, our cutoff was 'Natural disaster wipes out our main campus and a significant portion of staff" - that was the line in the sand.

Another fun fact; while I was working with folks in Bing, they mentioned that one of their worst case scenarios was an extended Google outage. Google going down for any real amount of time would basically cripple Bing, as they are #2 in many markets, but still significantly less traffic, and that much real, organic traffic is basically impossible to filter in a meaningful way. I suspect a lot of 'second place' businesses have similar 'concerns', although obviously they'd all jump on the opportunity from a business perspective, it would just be rough operationally for a while.

Docjowles
Apr 9, 2009

Falcon2001 posted:

FWIW: a lot of risk assessment plans have points where they go 'welp, this is outside of scope' - like for example, when I did this at a FAANG, our cutoff was 'Natural disaster wipes out our main campus and a significant portion of staff" - that was the line in the sand.

Another fun fact; while I was working with folks in Bing, they mentioned that one of their worst case scenarios was an extended Google outage. Google going down for any real amount of time would basically cripple Bing, as they are #2 in many markets, but still significantly less traffic, and that much real, organic traffic is basically impossible to filter in a meaningful way. I suspect a lot of 'second place' businesses have similar 'concerns', although obviously they'd all jump on the opportunity from a business perspective, it would just be rough operationally for a while.

Ha, that's funny and also a good point. Usually your biggest competitor going away would be incredible. But when that competitor is loving Google, good luck absorbing the surge from one of the highest traffic things on earth turning the Eye of Sauron on your infrastructure.

Hadlock
Nov 9, 2004

Docjowles posted:

Ha, that's funny and also a good point. Usually your biggest competitor going away would be incredible. But when that competitor is loving Google, good luck absorbing the surge from one of the highest traffic things on earth turning the Eye of Sauron on your infrastructure.

Tangentially related, I worked for a telehealth thing before and during covid, this was before "rapid tests" or drive through testing stations were a thing and people needed a way to fast track elderly and immunocompromised people into hospitals. We went from pretty sedate couple hundred doctors to doing 300% traffic in the span of a couple of weeks. Thankfully most of our infrastructure was sourced out to a handful of vendors and once we fixed a couple of database issues the back end scaled up without too much issue

I don't think this ever came to light before, but federal agency HHS health and human services offered us 5000 physicians to backstop our operations and handle the additional load. It's the only time I've ever seen the feds actually jump to action so decisively outside of military operations

Thankfully (?) it quickly came to realization that being diagnosed over the Internet with "probable covid" didn't really do anything for anyone or their mental health, and commercial rapid tests finally came on the market in ~late July and demand tapered off by September November mostly

Goddamn I had mentally blocked all that out of my memory for the most part :regd20: what a poo poo show

Docjowles
Apr 9, 2009

Hadlock posted:

Tangentially related, I worked for a telehealth thing before and during covid, this was before "rapid tests" or drive through testing stations were a thing and people needed a way to fast track elderly and immunocompromised people into hospitals. We went from pretty sedate couple hundred doctors to doing 300% traffic in the span of a couple of weeks. Thankfully most of our infrastructure was sourced out to a handful of vendors and once we fixed a couple of database issues the back end scaled up without too much issue

I don't think this ever came to light before, but federal agency HHS health and human services offered us 5000 physicians to backstop our operations and handle the additional load. It's the only time I've ever seen the feds actually jump to action so decisively outside of military operations

Thankfully (?) it quickly came to realization that being diagnosed over the Internet with "probable covid" didn't really do anything for anyone or their mental health, and commercial rapid tests finally came on the market in ~late July and demand tapered off by September November mostly

Goddamn I had mentally blocked all that out of my memory for the most part :regd20: what a poo poo show

drat that is crazy. What an absolutely insane loving time that was, not to derail into covid stuff. I work in the travel sector so the pandemic affected my job as well, but it sure wasn't an increase. Thankfully I somehow weathered the mass layoffs and we are doing reasonably well again. Rip to a lot of amazing coworkers who had to find something else in 2020 though.

Hadlock
Nov 9, 2004

Apparently grafana released v1.0.0 of aptly named "alloy" about a month ago, on April 4 of this year

As far as I can tell it's a mega binary with libraries from open telemetry, promtail (Loki log exporter), Prometheus exporter

It appears to be an all in one binary

Has anyone used it? Looks like they're being pretty aggressive with it, EOL for agent and promtail looks to be November 2025 (so 18 months)

Warbird
May 23, 2012

America's Favorite Dumbass

I hate kubernetes so loving much. It's actually pretty cool and good if the cluster is managed and you're paying MCS/Google/Amazon to suffer but gently caress me I've been living in a nightmare for the better part of a week.

I'm limited to doing things on a M series MBP due to memory constraints on my Proxmox setup. What is the """best""" way to skin the cat of having an actual K8s cluster running locally? kind and its like are super interesting but I'm trying to follow along in a text that presumes a few things that make that a non option.

I've tried genning up a few VMs in Vagrant and throwing Ansible playbooks (both provided by the text and making my own) and that was an exercise in frustration. Is there some preconfigured VM image, or a script, or a ready made ansbile playbook or similar that I can use to quickly bootstrap a ""real"" cluster?

I'm not opposed to "clickops" if the need be, I really just want to get something working before I just say gently caress it and use a managed cluster somewhere.

The Fool
Oct 16, 2003


what's wrong with k3s

Clark Nova
Jul 18, 2004

Warbird posted:

I hate kubernetes so loving much. It's actually pretty cool and good if the cluster is managed and you're paying MCS/Google/Amazon to suffer but gently caress me I've been living in a nightmare for the better part of a week.

I'm limited to doing things on a M series MBP due to memory constraints on my Proxmox setup. What is the """best""" way to skin the cat of having an actual K8s cluster running locally? kind and its like are super interesting but I'm trying to follow along in a text that presumes a few things that make that a non option.

I've tried genning up a few VMs in Vagrant and throwing Ansible playbooks (both provided by the text and making my own) and that was an exercise in frustration. Is there some preconfigured VM image, or a script, or a ready made ansbile playbook or similar that I can use to quickly bootstrap a ""real"" cluster?

I'm not opposed to "clickops" if the need be, I really just want to get something working before I just say gently caress it and use a managed cluster somewhere.

maybe try rancher desktop? virtualization stuff on apple silicon isn't great

Warbird
May 23, 2012

America's Favorite Dumbass

The text I'm following along with to get a better understanding of how this all work presumes you are using Calico, Longhorn, an Inginx Ingress, and some metrics solution I forget the name of running on a cluster of VMs or a cloud solution. Longhorn apparently absolutely will not run in any of your K8s lite type solutions due to how the solution is implemented. It very well could work in Rancher Desktop or k3s, but at this time I don't know. It could be reasonably easy to adapt things to work with a different storage solution but I don't know enough about k8s at this time to do that swap much less try to troubleshoot it enough to get it over the hump.

K3s I have nothing against but I'd really prefer to stick with the "full" solution for the sake of understanding it better before moving off onto one of the deviations. Case in point: kind. It's a super neat solution and great for quick testing, but isn't ideal for learning the ins and outs of cluster-ing and so forth due to the abstractions its doing along with the fact that docker desktop is going to be involved; at least in my case.

It's goddamn weird that Windows and WSL2 are the winners here. I really need to build a new server with more than 16 gigs of ram.

Hadlock
Nov 9, 2004

How much Kubernetes do you need

K3S is great it's API compatible but from what I understand it doesn't support all the bleeding edge features introduced after ~1.18. k3S was written by the current ... CTO? of rancher. I've spun up a cluster of raspberry pi at home with it before in recent memory but last time I tried it there was no turn key solution for PVC

Rancher is the enterprise solution for what you're trying to do and it's full fat Kubernetes, with support contracts available if you so desire

Kops is an option and I used that to launch my current career but I haven't used it since like 2018 but I've heard good things about it since apparently it's running an alarming amount of MasterCard apparently so apparently it's not completely terrible still

Resdfru
Jun 4, 2004

I'm a freak on a leash.
In my opinion, create 2 or 3 vms with whatever Linux dostro you like and just create a cluster yourself. You'll get the benefit of having to troubleshoot any issues and learn from that

https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/

Then follow your book or whatever. You could also just try creating some resources and stuff. Go to the kube docs and create a pod, then a deployment, then a service, some volumes, whatever. I'm assuming you mean youre coming in to this mostly fresh. If your book is some advanced thing you're trying to do then ignore this second part

Warbird
May 23, 2012

America's Favorite Dumbass

I suspect that's going to be the call. It's going to make me itchy to just manually do things, but better than than keeping on this path. One thing, is the 24.04 LTS of Ubuntu too new and asking for trouble? I don't mind using 20.04 but figured I'd ask.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams

Clark Nova posted:

maybe try rancher desktop? virtualization stuff on apple silicon isn't great

It's a hell of a lot better if you switch from qemu emulation to vz. With the caveat that I've always turned off the kubernetes feature immediately, but switching the emulation made our core app run about 8000 times faster locally.

Warbird
May 23, 2012

America's Favorite Dumbass

Well the one upside of this is that I can return to utterly ignoring docker desktop for Orbstack again.

:orb:

Hadlock
Nov 9, 2004

Warbird posted:

I'm not opposed to "clickops" if the need be, I really just want to get something working before I just say gently caress it and use a managed cluster somewhere.

Why do you hate the easy route

Unmanaged is all fun and games until your etcd cluster rolls over and dies

You need like, minimum one additional headcount to keep that set of spinning plates from crashing down and taking the company with it

Hadlock
Nov 9, 2004

Oh yeah spin up three Ubuntu 20.04 nodes and install K3S or Rancher and work from there

It took me like 45 min to get a k3S cluster up and running but most of that time was pulling down arm images over a residential connection during lock down when everyone was streaming Netflix 24/7

xzzy
Mar 5, 2009

Installing k8s from scratch isn't that hard.. once you know how to do it. But unfortunately the docs are written for people that already know the software, making it pretty frustrating for a first timer. There's a minimum amount of concepts and terminology you have to internalize to before anything clicks. Best practices have shifted over the years too so mercy on your soul if you land on a blog post from six years ago.

It doesn't help that almost all of the tutorials are "just use k3s/minikube on your desktop! It's easy!" which certainly gets you the ability to fire up pods real fast but this covers up a lot of the details so you don't get any closer to understanding how a production k8s cluster works.

It took me many evenings with my three raspberry pi's deleting and starting over before I could actually get the drat suite to work.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Warbird posted:

The text I'm following along with to get a better understanding of how this all work presumes you are using Calico, Longhorn, an Inginx Ingress, and some metrics solution I forget the name of running on a cluster of VMs or a cloud solution. Longhorn apparently absolutely will not run in any of your K8s lite type solutions due to how the solution is implemented. It very well could work in Rancher Desktop or k3s, but at this time I don't know. It could be reasonably easy to adapt things to work with a different storage solution but I don't know enough about k8s at this time to do that swap much less try to troubleshoot it enough to get it over the hump.

K3s I have nothing against but I'd really prefer to stick with the "full" solution for the sake of understanding it better before moving off onto one of the deviations. Case in point: kind. It's a super neat solution and great for quick testing, but isn't ideal for learning the ins and outs of cluster-ing and so forth due to the abstractions its doing along with the fact that docker desktop is going to be involved; at least in my case.

It's goddamn weird that Windows and WSL2 are the winners here. I really need to build a new server with more than 16 gigs of ram.
This is perhaps a dumb question: if Calico and Longhorn aren't actually fundamental to what you're trying to do, have you considered following along a different text that doesn't have so much totally unnecessary complexity?

There's very little that needs more than the bundled Kubernetes in Docker Desktop until you start actually trying to operationalize things in miniature (as you're finding with a whole rear end distributed storage solution and network overlay).

Vulture Culture fucked around with this message at 12:59 on May 10, 2024

Erwin
Feb 17, 2006

Warbird posted:

I suspect that's going to be the call. It's going to make me itchy to just manually do things, but better than than keeping on this path. One thing, is the 24.04 LTS of Ubuntu too new and asking for trouble? I don't mind using 20.04 but figured I'd ask.

Building a cluster with kubeadm is manual, but it's a set of commands that can be easily slapped into an Ansible playbook if you plan to do it more than once. Even if you do it once it's just a handful of commands, and the commands you run on the control plane nodes spit out what you should run on other nodes, so it's half-automated and you're just doing some pasting. It also gives some visibility into what happens under the hood. Building a cluster with kubeadm on a few VMs takes like 10 minutes plus apt install time.

I haven't used Ubuntu 24.04, but ever since Ubuntu switched to systemd-resolved (20.04?) it was a nightmare to get a kubeadm cluster working. I think Debian 12 now uses systemd-resolved, but hopefully the kubeadm documentation has been adjusted to deal with resolved.

DkHelmet
Jul 10, 2001

I pity the foal...


Yeah- all of that is overkill unless you're testing workloads under those complex addons. And if you are, I'd be spinning up a EKS/AKS/whatever cluster to validate it all works in situ rather than on a desktop.

Zorak of Michigan
Jun 10, 2006


The thing that bothers me about setting up my own k8s cluster with kubeadm is that I can get it to work at a given time, but then I have to be the guy who watches the change logs from all the components, figures out when and how to upgrade, and manages the playbooks for upgrades and for new installs of the new configs. For me, trying to do it as one of many tasks, it got out of hand real fast.

The Fool
Oct 16, 2003


literally why aks/eks/gke exist

Warbird
May 23, 2012

America's Favorite Dumbass

Vulture Culture posted:

This is perhaps a dumb question: if Calico and Longhorn aren't actually fundamental to what you're trying to do, have you considered following along a different text that doesn't have so much totally unnecessary complexity?

There's very little that needs more than the bundled Kubernetes in Docker Desktop until you start actually trying to operationalize things in miniature (as you're finding with a whole rear end distributed storage solution and network overlay).

Sunk cost fallacy at this point tbh. It's a decent book and I'm trying to get a holistic understanding of how this all works. The author is, so far, threading the needle between the entry level "single hour long-ish youtube video" and "white paper" and getting that middle complexity is pretty hard to find. I've done no end of blogs and so on and so forth in the past and that's a nightmare of typos, changing standards and dependencies, and frequently entirely left out details that cause fun surprises later. That's less of a problem if you know what you're doing as xzzy alluded to, but I ain't there.


As for the complexity, the distributed storage is one of the key reasons I'm reading an actual goddamn book as I've always been unclear on how any of that works and how to potentially make use of it in my home setup in the future. Same for network routing and so on and so forth. Most anyone can google a command but the contextual awareness is really important in my field.

And yes, a managed cluster is the smarter call but I paid an assload of money for the fancy CPU and extra memory on this stupid laptop so I may as well use it. That or try and trick my employer into letting me expense some new server parts.

Trapick
Apr 17, 2006

Is there a standard "start here" book/site/course/whatever for Terraform? Could be geared towards use with Azure but doesn't need to be.

I'm going through the Hashicorp site and it's fine, just curious if there's a better resource.

Resdfru
Jun 4, 2004

I'm a freak on a leash.
If videos aren't a problem for you and you really want to dive in I can't recommend this guys course enough. You can get it on sale probably for a lot less when udemy has sales https://www.udemy.com/course/certified-kubernetes-administrator-with-practice-tests/

The labs after each video are on his kodekloud platform which gives you a real hands on environment to do the thing you just learned.

They're really good imo.


For terraform, I dunno. My advice there is just come up with some stuff you wanna deploy with terraform and write it. Use the docs for resources and google when you need to. And keep doing that.


After you get some hands on time with it then check this out so you can try to avoid any bad practices
https://cloud.google.com/docs/terraform/best-practices-for-terraform

Then after working with it some more you'll know which best practices you can ignore

Hadlock
Nov 9, 2004

Trapick posted:

Is there a standard "start here" book/site/course/whatever for Terraform? Could be geared towards use with Azure but doesn't need to be.

I'm going through the Hashicorp site and it's fine, just curious if there's a better resource.

IMO the terraform hello world is spinning up managed Kubernetes and migrating the state file to blob storage/S3 and maybe installing Prometheus/grafana via helm via terraform

Any tutorial will mostly work but yeah Kubernetes and terraform providers versions change so often the tutorial is mostly a rough guide and you'll need to refer to current documentation

Right now Kubernetes sweet spot for compatibility is 1.28 on eks imo

The Fool
Oct 16, 2003


The hard part of terraform is knowing what the thing you are deploying does, the language itself is painfully simple, and you won't get exposure to any of the footguns until you start doing larger deployments at scale.

Easiest way to learn it is to just take stuff you already know how to deploy, and do it in terraform instead.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Hadlock posted:

maybe installing Prometheus/grafana via helm via terraform


don’t suggest self harm

Trapick
Apr 17, 2006

The Iron Rose posted:

don’t suggest self harm
Welp, lock the thread.

(Thanks for the advice folks!)

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
To be clear that’s not a dig at Prometheus/grafana. It’s a dig at the notion of managing k8s objects or helm releases with terraform. It’s just not the right tool for the job and you’re bette served by writing a CI pipeline to apply the helm charts or even the underlying k8s manifests directly. Using terraform adds an unnecessary layer of abstraction and complexity and your code isn’t even particularly portable as a result, plus you suffer all the overhead of state files.

Hadlock
Nov 9, 2004

I'm using helm via terraform to bootstrap ArgoCD and Prometheus on my command and control cluster, as well as cert manager, reflector and a handful of other basic services all with pinned versions

Yeah I agree helm in general is a bad thing to use inside terraform, however for a hello world It's Probably Fine™, plus it gives you a diagnostic tool to watch while you do other tomfoolery

ArgoCD has an "app of apps" pattern I could probably adopt when everything chills the gently caress out but that's not today

Gucci Loafers
May 20, 2006

Ask yourself, do you really want to talk to pair of really nice gaudy shoes?


Trapick posted:

Is there a standard "start here" book/site/course/whatever for Terraform? Could be geared towards use with Azure but doesn't need to be.

I'm going through the Hashicorp site and it's fine, just curious if there's a better resource.

I've had a ton of fun with this and they have their own discord,

https://x.com/CloudChallenges/status/1547607942358192130

Adbot
ADBOT LOVES YOU

The Fool
Oct 16, 2003


Gucci Loafers posted:

I've had a ton of fun with this and they have their own discord,

https://x.com/CloudChallenges/status/1547607942358192130

I like https://learntocloud.guide/ better

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply