Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Methanar
Sep 26, 2013

by the sex ghost
It took 30 minutes into the first change window of 2023 for somebody to cause an outage.

Some random security person set some very dumb sysctl settings in a very dumb way that managed to circumvent guard rails that I personally put up over a year ago to protect against exactly, EXACTLY this threat model.

Methanar fucked around with this message at 05:20 on Jan 4, 2023

Adbot
ADBOT LOVES YOU

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.
What's everyone's general sentiment about using Cloud Native Buildpacks to build Docker images vs hand-writing Dockerfiles? I'm just starting to kick the tires on https://paketo.io/, and immediately discovering that installing just one targeted system library that native code will call into seems more difficult than it should be.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
We're using a wrapper for Buildah to build our images because none of the build pack style systems seems to lead anywhere other than lock you into a particular flavor of cloud and because we're trying to build distributable containers for the public this worked out better for our needs as well.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Methanar posted:

It took 30 minutes into the first change window of 2023 for somebody to cause an outage.

Some random security person set some very dumb sysctl settings in a very dumb way that managed to circumvent guard rails that I personally put up over a year ago to protect against exactly, EXACTLY this threat model.
This is how I ended up with an E2E acceptance test suite with modifications under lock and key

Hadlock
Nov 9, 2004

Vulture Culture posted:

This is how I ended up with an E2E acceptance test suite with modifications under lock and key

This is the way

EoRaptor
Sep 13, 2003

by Fluffdaddy
So it appears that Azure Shared Image Galleries don't understand that 01/01/2023 is more recent than 12/31/2022, and aren't correctly flagging the 'latest' image.
:suicide101:

Wizard of the Deep
Sep 25, 2005

Another productive workday

EoRaptor posted:

So it appears that Azure Shared Image Galleries don't understand that 01/01/2023 is more recent than 12/31/2022, and aren't correctly flagging the 'latest' image.
:suicide101:

I mean, 12 is a lot higher than 1 or 11.

StumblyWumbly
Sep 12, 2007

Batmanticore!
As always, the software is fine, it is the millions of people doing things the same way they have for decades who are wrong.

Methanar
Sep 26, 2013

by the sex ghost
Another day another insane 4 hour prod fire network troubleshoot session.

gently caress me. I'm 2 hours late in taking an advil.

George Wright
Nov 20, 2005
What Linux and K8s distros are folks using for K8s on metal these days?

Looks like on the Linux distro side there are the usual suspects, but also distros like Flatcar, Bottlerocket, and Talos. Any experience here with those? Any horror stories?

As for K8s, EKS-D looks appealing so we could have the same distro on metal as we do AWS. Any experience with EKS-D or any other K8s distros? Any horror stories?

freeasinbeer
Mar 26, 2015

by Fluffdaddy
If on EKS I like bottlerocket, as between that and karpenter it can get nodes online in as fast as 60s, but the standard aws linux image also is 100% fine.

On the other clouds I’d default to whatever their flavor of hosted K8s uses.

For my home lab I use Ubuntu, flatcar is nifty but if your doing anything weird like GPUs or playing with containerd plugins like stargz, it’s easier to not use one of the “hermetically” sealed OSes.

TalosOS is the only other one I’ve really looked at, but it is very opinionated, and wants you to use their K8s tooling, so that if your bringing anything else it’d be a headache.

The only thing I’d really avoid is any of the fedora OS or using centos/red hat. They killed coreos in favor of fedora OS for what externally appears to be not invented here reasons, and with mainline OS having really old kernels that sometimes makes it a PITA to use stuff that uses ebpf.

Edit: I’d avoid EKS-D, it’s not really like EKS at all, and it’s very new to recommend it over existing stuff.

I really like k3s, in particular if you have experience running some flavor of database already and have backups figured out, but it’s different

freeasinbeer fucked around with this message at 00:41 on Jan 16, 2023

Methanar
Sep 26, 2013

by the sex ghost


I checked my email for the first time in a few days.
I got a thank you email from a random director ccing my own management chain and 160 dollar gift card in exchange for giving myself PTSD over the past 3 weeks after responding to 5-6 prod incidents.

im a team player

Docjowles
Apr 9, 2009

Optimistically, ammo for that promotion packet? :yaycloud:

Alternatively, I've lost the thread of the Methanar saga over the years but haven't you been at current job for a long time? Maybe it's time to look around for something that does not trigger PTSD, especially if promotion/raise is poo poo.

I know there are lots of layoffs going on and general gloom about Economic Slowdown. But someone who has expert level k8s, AWS, BGP, Linux, etc knowledge has options.

Methanar
Sep 26, 2013

by the sex ghost

Docjowles posted:

Optimistically, ammo for that promotion packet? :yaycloud:

Alternatively, I've lost the thread of the Methanar saga over the years but haven't you been at current job for a long time? Maybe it's time to look around for something that does not trigger PTSD, especially if promotion/raise is poo poo.

I know there are lots of layoffs going on and general gloom about Economic Slowdown. But someone who has expert level k8s, AWS, BGP, Linux, etc knowledge has options.

I was no joke half way through writing something positive when pagerduty paged me for the 4th time today.
(For the third time this month. Somebody on the security team pushed out broad changes and walked away without testing or validating poo poo and broke everything leaving me to get called in for it because Kubernetes is the most visible thing to fail. Wasn't even Kubernetes-specific this time - just broke everything that depends on running the base Chef role)



I produce millions and millions of value to this org every year. My original pre-IPO equity grant has two vests left. My stress and responsibility is through the roof and has been for a long time. And my stock value is 1/3 of what it was 12 months ago.
If I don't get senior II, and another fat equity grant, for my 4 year anniversary during annual reviews this summer I'm absolutely ragequitting.

Really, though, I should just became management here because it seems like a way easier job. And I might actually be able to root cause fix some of the underlying problem patterns here that also burned out the previous 2 tech leads of my group.

Methanar fucked around with this message at 06:36 on Jan 19, 2023

Hadlock
Nov 9, 2004

Methanar posted:

Really, though, I should just became management here because it seems like a way easier job. And I might actually be able to root cause fix some of the underlying problem patterns here that also burned out the previous 2 tech leads of my group.

As manager you'll still need rapport and consensus with the management team to get your changes greenlit and scheduled. If two principals left and you're about to rage quit I suspect the problem goes all the way to the CTO and back down again

Good luck

jaegerx
Sep 10, 2012

Maybe this post will get me on your ignore list!


Anyone done the switch from istio to cilium yet? What am I looking at?

madmatt112
Jul 11, 2016

Is that a cat in your pants, or are you just a lonely excuse for an adult?

Hey I have a good idea, let's make a Thursday deadline to migrate every single loving thing in the entire platform to new subnets, and then at 3pm on Friday we'll turn off the old subnets.

What? not everybody managed to move every little noodly bit and bob into the new subnets, and make sure that their codebases and systems are set up to work with the new proxy systems?

Too fuckin' bad, kill it and take a weekend, fuckers!

WHAT THE CHRIST

Docjowles
Apr 9, 2009

madmatt112 posted:

Hey I have a good idea, let's make a Thursday deadline to migrate every single loving thing in the entire platform to new subnets, and then at 3pm on Friday we'll turn off the old subnets.

What? not everybody managed to move every little noodly bit and bob into the new subnets, and make sure that their codebases and systems are set up to work with the new proxy systems?

Too fuckin' bad, kill it and take a weekend, fuckers!

WHAT THE CHRIST

:rubby:

madmatt112
Jul 11, 2016

Is that a cat in your pants, or are you just a lonely excuse for an adult?


Like, what's a grace period? Do these idiots realize how much they've broken across the entire platform? Setting us all up for a lovely weekend too.

Wizard of the Deep
Sep 25, 2005

Another productive workday

madmatt112 posted:

Like, what's a grace period? Do these idiots realize how much they've broken across the entire platform? Setting us all up for a lovely weekend too.

It's a real shame your phone broke Friday at 4:30 and the earliest time the phone store can get you in is Monday at 9 am.

freeasinbeer
Mar 26, 2015

by Fluffdaddy

jaegerx posted:

Anyone done the switch from istio to cilium yet? What am I looking at?

Cilium is very alpha quality at the moment unless you are just replacing your existing CNI, I’d wait, but it’s still the right direction

George Wright
Nov 20, 2005

freeasinbeer posted:

Cilium is very alpha quality at the moment unless you are just replacing your existing CNI, I’d wait, but it’s still the right direction

From a CNI perspective, a service mesh perspective, or both?

jaegerx
Sep 10, 2012

Maybe this post will get me on your ignore list!


freeasinbeer posted:

Cilium is very alpha quality at the moment unless you are just replacing your existing CNI, I’d wait, but it’s still the right direction

It’s the default for gke and eks now I think. I’m on prem though.

Methanar
Sep 26, 2013

by the sex ghost
Cilium is mostly fine these days. Just stay n-1 off current major release and you'll be okay.

freeasinbeer
Mar 26, 2015

by Fluffdaddy
I was specifically playing with the bgp peering side last weekend and using the newer bgp setup, it was way more frustrating then I wanted it to be, but it is nifty to be able to directly hit pod IPs over the network.

LoadBalancers in bgp mode don’t support local target mode, and it’s very much alpha. To be fair I even think it’s tagged that way, but if your looking for bgp peering they’ve only added it at all very recently, that entire implementation around metallb is being ripped out it seems, and replaced with the new stuff which seems like a big deal for on prem.

Other then that DSR was super fiddly, options are not explained well, and if you install the defaults are kinda opaque.

I was only playing with it in my homelab, but the service mesh is not as well evolved as istio is, but that’s ok for now.

So while all the features are nifty it took my way longer then I wanted to get all running and feels on the whole a bit trying to be all things for all people. So many features don’t work in one mode or another or have severe caveats.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine
just use calico

Lucid Nonsense
Aug 6, 2009

Welcome to the jungle, it gets worse here every day
I asked this in the Infosec thread, and thought you guys might have some feedback on this:

We're in the process of rewriting our storage engine (log management software) and are adding data silos. I've been tasked with the architecture for this, including rbac. What is everyone's requirements for this in a SIEM? Do you handle it by host access, or on a more granular level?

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Lucid Nonsense posted:

I asked this in the Infosec thread, and thought you guys might have some feedback on this:

We're in the process of rewriting our storage engine (log management software) and are adding data silos. I've been tasked with the architecture for this, including rbac. What is everyone's requirements for this in a SIEM? Do you handle it by host access, or on a more granular level?
For us, a host is often a container that might live for as short as several seconds, so organizing things by host is frequently not useful

Lucid Nonsense
Aug 6, 2009

Welcome to the jungle, it gets worse here every day

Vulture Culture posted:

For us, a host is often a container that might live for as short as several seconds, so organizing things by host is frequently not useful

Would you typically configure logging on that? I think the host logging in that situation would be the one running the container.

Docjowles
Apr 9, 2009

Lucid Nonsense posted:

Would you typically configure logging on that? I think the host logging in that situation would be the one running the container.

That's one style. It's also common to launch a pod that has the main app container, which writes to stdout/stderr, and a sidecar container that reads from those and ships to one or more logging destinations. Hell if you are running in something like AWS Fargate you don't even have access to the underlying host to install log management tools.

Personally I would enjoy machines authenticating with some sort of token or certificate, and humans authenticating with the usual SSO suspects (AzureAD, Okta, etc).

12 rats tied together
Sep 7, 2006

for rbac permissions in the app, let me set: principal plus action plus resource scope. provide documentation on every action for every resource scope

it's fine if the actions are generic (e.g. read, write, execute) across every resource scope

i can put usernames and passwords on my crap to authenticate. i would not make a ton of assumptions about how or why people do this and what their deployment looks like. i would try to fully decouple authentication from authorization.

Lucid Nonsense
Aug 6, 2009

Welcome to the jungle, it gets worse here every day
I guess I should have said source type rather than host. Would all of your AWS logs go into one bucket/silo with the same rbac rules and retention? Here's how I have the flow now.



Or would it be better to decide which silo data goes to after rules processing?

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Lucid Nonsense posted:

Would all of your AWS logs go into one bucket/silo with the same rbac rules and retention?
For SIEM? We'd probably treat them the same. The second that the log search is opened up to anyone outside the security org, though, all bets are off.

luminalflux
May 27, 2005



Lucid Nonsense posted:

I asked this in the Infosec thread, and thought you guys might have some feedback on this:

We're in the process of rewriting our storage engine (log management software) and are adding data silos. I've been tasked with the architecture for this, including rbac. What is everyone's requirements for this in a SIEM? Do you handle it by host access, or on a more granular level?

That it be better be loving integrated with Okta

also no per-host licensing because we cycle hosts faster than George Santos invents new lies

Lucid Nonsense
Aug 6, 2009

Welcome to the jungle, it gets worse here every day

Vulture Culture posted:

For SIEM? We'd probably treat them the same. The second that the log search is opened up to anyone outside the security org, though, all bets are off.

Exactly why we're doing this. If you send your logs to the centralized server, the data in the aws bucket would only be accessible by your team. Others would see whatever they have rights to, but they'd be in different silos. I'm trying to figure out if there is any reason for aws logs to be split up, or if they could be grouped together with the same retention and access.


luminalflux posted:

That it be better be loving integrated with Okta

also no per-host licensing because we cycle hosts faster than George Santos invents new lies

We have ldap, which I think Okta has an agent for, not sure if that would satisfy that need without digging into it. We don't license per host, just per server. But I'm not trying to push a product here, just figure out how our data silos would work. Sounds like an aws/azure/gcp bucket would work for the devops guys, then have a route/switch/server bucket for the sysadmins, and firewall/traffic bucket for the security guys, for an example. It will be user configurable, so I just need to find out if there are any common reasons to parse cloud logs and sort them into different buckets.

luminalflux
May 27, 2005



Lucid Nonsense posted:

We don't license per host, just per server.

What's the distinction here between "host" and "server"?

Lucid Nonsense
Aug 6, 2009

Welcome to the jungle, it gets worse here every day
The server you install our software on needs a license. Sending devices don't affect licensing, but licensing is based on the volume ingested. So if all devices are sending a total of 50 million events per day, you would need a license for that volume.

luminalflux
May 27, 2005



Ah ok, makes sense.

Erwin
Feb 17, 2006

Lucid Nonsense posted:

The server you install our software on needs a license. Sending devices don't affect licensing, but licensing is based on the volume ingested.

Dear Datadog…

Adbot
ADBOT LOVES YOU

luminalflux
May 27, 2005



Erwin posted:

Dear Datadog…

Oh don't worry, they have both kinds of pricing

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply