Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
12 rats tied together
Sep 7, 2006

the perceived difficulty in running k8s on-prem is really exaggerated compared to reality, imo.

it's definitely not as easy as "click button -> run shell one liner to configure kubectl" like AWS gives you, but it's a fairly standard core group/worker group cluster setup and you need to put some consideration into rack and power diversity etc to make sure you don't lose enough of your core to either break everything or do some crazy split brain poo poo or whatever.

if you aren't equipped to be able to intelligently locate your k8s nodes such that losing a rack doesn't cripple your cluster, well, you kind of had that problem anyway and k8s wasn't going to solve it for you. you still have to be good at operating physical infrastructure and a lot of people just aren't. it is very useful on-prem of course, for obvious reasons. chick-fil-a was doing it right when they started running a cluster in each store.

i would suggest that if you were going to try and build your own set of application abstractions for on-prem you should pretty much copy paste the k8s api and its concepts wholesale. it will help you avoid poo poo where developers will go "ok all the servers that end with 5 are the test servers" or issues where you decommission node12 and replace it with node94 instead of re-seating node12 and an entire set of processes that rely on assumed node names crumbles to the ground around you

its a very good way to be thinking about infrastructure deployed on-prem

Adbot
ADBOT LOVES YOU

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
k8s is generally pretty good and the tradeoffs it asks you to make are fairly reasonable.

GenJoe
Sep 15, 2010


Rehabilitated?


That's just a bullshit word.

12 rats tied together posted:

chick-fil-a was doing it right when they started running a cluster in each store

wait what

Qtotonibudinibudet
Nov 7, 2011



Omich poluyobok, skazhi ty narkoman? ya prosto tozhe gde to tam zhivu, mogli by vmeste uyobyvat' narkotiki

GenJoe posted:

wait what

yep.

Nomnom Cookie
Aug 30, 2009



k8s doesn’t give you anything that sufficiently good cloud tooling can’t do directly. in-house tools are usually poo poo though, which makes k8s valuable almost everywhere. the downside is k8s sucks dog dicks and this will cause lots of pain if you have needs that are in any way unusual. such as, it’s not actually possible to guarantee that a pod stops receiving requests before it stops. something that any sysadmin with a browser open to the EC2 api reference and a working copy of bash can script in 15 minutes, if you’re running on VMs directly

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison

Nomnom Cookie posted:

k8s doesn’t give you anything that sufficiently good cloud tooling can’t do directly. in-house tools are usually poo poo though, which makes k8s valuable almost everywhere. the downside is k8s sucks dog dicks and this will cause lots of pain if you have needs that are in any way unusual. such as, it’s not actually possible to guarantee that a pod stops receiving requests before it stops. something that any sysadmin with a browser open to the EC2 api reference and a working copy of bash can script in 15 minutes, if you’re running on VMs directly

well that's one of the tradeoffs, innit? i don't think it's unreasonable to suggest that if you're building software to run on k8s, that your devs should be immune from having to learn how it works and add in a wait or something on SIGTERM to keep in-flight requests from failing.

12 rats tied together
Sep 7, 2006

IMO its fair that, as a google product, k8s has high expectations of you in terms of your overall ability to control the applications that you manage. for something like a ui node, dropping inflight requests shouldn't matter because your web ui is just a javascript api client bundled with static content, so you can cleanly handle retry logic behind the scenes.

if the application is a listener for some adtech type poo poo (since its google) you likely do not care about any amount of requests that isn't at least in the 1000s.

if you're running a database in there OTOH yeah that's kind of a problem, and could use some special consideration or handling. i probably would not ever intentionally run a database in k8s though.

animist
Aug 28, 2018

Schadenboner posted:

Why did the packet wrangling thread turn into some sort of containerized microservice hell?

:ohdear:

its because computers are make of broken glass and hatred and gradually destroy the brain of anybody working with them hth

Schadenboner
Aug 15, 2011

by Shine

animist posted:

its because computers are make of broken glass and hatred and gradually destroy the brain of anybody working with them hth

I've been a computer all along?

:ohno:

This is some Blade Runner poo poo right here...

Nomnom Cookie
Aug 30, 2009



uncurable mlady posted:

well that's one of the tradeoffs, innit? i don't think it's unreasonable to suggest that if you're building software to run on k8s, that your devs should be immune from having to learn how it works and add in a wait or something on SIGTERM to keep in-flight requests from failing.

that would be the suggestion of kubernetes devs, yes, that you add in an unconditional wait before sigterm is sent. sometimes this is fine. it makes everything slower, which might be why it doesn’t happen by default

adding a sleep is still not the same thing as guaranteeing that traffic has stopped, which again is trivially achievable if you’re using ELB and EC2 yourself. it’s the layer of iptables redirection poo poo in between required by k8s’s networking model that causes the problem. that, and the k8s devs insisting that it’s a layering violation if the left hand knows what the right hand is doing. SIGKILL a pod even though requests are still getting sent to it by several cluster nodes? sure!

we don’t have a large prod cluster, maybe 100 nodes, but some node not updating its iptables rules and still sending traffic to a dying pod for 1+ _minutes_ after the pod started termination is a regular occurrence. this is entirely an unnecessary consequence of k8s’s architecture and nothing else, and it makes it supremely aggravating to run services that need to be reliable on top of k8s

Nomnom Cookie
Aug 30, 2009



12 rats tied together posted:

IMO its fair that, as a google product, k8s has high expectations of you in terms of your overall ability to control the applications that you manage. for something like a ui node, dropping inflight requests shouldn't matter because your web ui is just a javascript api client bundled with static content, so you can cleanly handle retry logic behind the scenes.

if the application is a listener for some adtech type poo poo (since its google) you likely do not care about any amount of requests that isn't at least in the 1000s.

if you're running a database in there OTOH yeah that's kind of a problem, and could use some special consideration or handling. i probably would not ever intentionally run a database in k8s though.

we do realtime money things for large companies that care very much about SLAs. neither retries nor just not giving a gently caress about errors are options for us, which is a huge problem when using k8s (motto: if it ain’t broke, it needs more yaml)

12 rats tied together
Sep 7, 2006

yeah thats very fair, i would not want to put any stock trading(?) poo poo into k8s, especially if it were adjacent to whatever event firehose exists in that domain. i'd probably want to map external events directly to internal stream storage and then, if k8s has to be involved, its just hosting stream processors

similarly i would probably not put supply-side adtech poo poo into k8s either

i guess if you were google and you were both the supply and the demand you could be more intelligent about routing events to handlers that only sometimes exist, but, in general with adtech that poo poo is way less important because if you can't find a bidder for an ad slot you just can just pick something at random and serve it. can't really do that with stocks

Nomnom Cookie
Aug 30, 2009



we’re closer to adtech than stocks. when k8s was rolled out here, everything ran in batch and retries were fine restart pods whenever you want, sure. nobody cares if something falls on the floor temporarily. but then some very large deals were only possible if we could do stuff realtime instead, CTO decided we would go for it, and the pain started

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

put adtech into a dumpster and never work on or look at it again

Nomnom Cookie
Aug 30, 2009



it’s fraud prevention. the performance requirements and system design are the main similarity to adtech

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison
well its also worth noting that i dont think anything at google runs on k8s, it runs on borg (with other stuff) and k8s at this point has cleft pretty far from its roots

TwoDice
Feb 11, 2005
Not one, two.
Grimey Drawer
if your system requires that servers be politely shut down to work properly then it's a bad system

Cybernetic Vermin
Apr 18, 2005

a very misleading statement since the system is bound to be bad either way.

Phobeste
Apr 9, 2006

never, like, count out Touchdown Tom, man
scroogled again

Nomnom Cookie
Aug 30, 2009



TwoDice posted:

if your system requires that servers be politely shut down to work properly then it's a bad system

pray tell me, master, how shall I rebuild this system such that requests succeed even when the server has been SIGKILLed with the request in flight

Progressive JPEG
Feb 19, 2003

could put the requests on a queue, and commit the queue as requests are successfully processed. but i think this would usually give "at least once" guarantees when it sounds like you want "exactly once"?

another option could be to just add a client retry for the occasion when there is a flake, and hopefully it'd get kicked to a different backend instance that isn't being shut down. this redirect might be configurable via the service object's session affinity

but idk it sounds like you may have tried some of this already

12 rats tied together
Sep 7, 2006

imo the requirements of a system can't make it a bad system, since a system is only as useful as it can be applied to solve real world problems

there are probably real world problems that require you to never drop events and have latency requirements that make decoupling handlers from receivers not a workable solution. i can't think of any right now but i don't go outside anymore so

i would think that fraud detection is a process that could be decoupled from whatever the input is, like, you would have some statistical model that you train out of band being used to verify live events, but the events are also used for training that model, which you redeploy every 6 hours or whatever.

in any case i think putting something in k8s fundamentally means giving up some control in exchange for some niceties. that's not always gonna be a good idea (but usually is)

Nomnom Cookie
Aug 30, 2009



Progressive JPEG posted:

could put the requests on a queue, and commit the queue as requests are successfully processed. but i think this would usually give "at least once" guarantees when it sounds like you want "exactly once"?

another option could be to just add a client retry for the occasion when there is a flake, and hopefully it'd get kicked to a different backend instance that isn't being shut down. this redirect might be configurable via the service object's session affinity

but idk it sounds like you may have tried some of this already

when I said realtime earlier I meant we have a 250ms deadline. request comes in from merchant, we do as much fraud detection as we can in that window, return a decision. if we hit 230-240ms without reaching a conclusion then we return a random decision. that leaves very little room for timeouts and retries. obviously there are ways to decouple processing from the success or failure of a single http request, but they’re not applicable here. we get sent one request and it’s our one chance to do something reasonable. getting the client to retry isn’t possible because waiting for us is blocking them taking payment and showing a spinner in the customer’s browser

these are very harsh constraints! sometimes poo poo doesn’t work right and we fall back to random or even time out. poo poo happens and you just have to accept that. it’s just aggravating as gently caress to discover that k8s is built for workloads that no one really gives a poo poo about except at large scales. we do actually want to avoid chopping off those dozen requests if we can, because high-value merchants are watching that success rate very closely, and this flies directly in the face of everything k8s stands for

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'

if you are a successfully serving internet ads you are unironically contributing to the downfall of society

abigserve
Sep 13, 2009

this is a better avatar than what I had before

Nomnom Cookie posted:

when I said realtime earlier I meant we have a 250ms deadline. request comes in from merchant, we do as much fraud detection as we can in that window, return a decision. if we hit 230-240ms without reaching a conclusion then we return a random decision. that leaves very little room for timeouts and retries. obviously there are ways to decouple processing from the success or failure of a single http request, but they’re not applicable here. we get sent one request and it’s our one chance to do something reasonable. getting the client to retry isn’t possible because waiting for us is blocking them taking payment and showing a spinner in the customer’s browser

these are very harsh constraints! sometimes poo poo doesn’t work right and we fall back to random or even time out. poo poo happens and you just have to accept that. it’s just aggravating as gently caress to discover that k8s is built for workloads that no one really gives a poo poo about except at large scales. we do actually want to avoid chopping off those dozen requests if we can, because high-value merchants are watching that success rate very closely, and this flies directly in the face of everything k8s stands for

Seems like you need middleware (or better? middleware) between the client and the backend to buffer the request and determine if the backend is hosed

With that said you still won't ever get fully around the issue because "sometimes HTTP requests will fail and that is ok" is basically the definition of 2020 backend web development

Nomnom Cookie
Aug 30, 2009



Captain Foo posted:

if you are a successfully serving internet ads you are unironically contributing to the downfall of society

rcp

Nomnom Cookie
Aug 30, 2009



abigserve posted:

Seems like you need middleware (or better? middleware) between the client and the backend to buffer the request and determine if the backend is hosed

With that said you still won't ever get fully around the issue because "sometimes HTTP requests will fail and that is ok" is basically the definition of 2020 backend web development

yeah, you’re not saying anything I don’t know. it’s not feasible to guarantee that requests never fail. I’m bitching about an aspect of k8s that guarantees some requests will fail, when they could have made different choices to avoid the failures. architectural purity is valued more highly than proper operation, and that pisses me off

Qtotonibudinibudet
Nov 7, 2011



Omich poluyobok, skazhi ty narkoman? ya prosto tozhe gde to tam zhivu, mogli by vmeste uyobyvat' narkotiki

Nomnom Cookie posted:

it’s not actually possible to guarantee that a pod stops receiving requests before it stops

"possible" is a strong word. there are some things that are indeed not possible unless you throw a whole lot of things out the window--if you want to exceed c and violate causality, for instance, some very fundamental things need to change, so that's de facto not possible. computers are usually a bit more flexible.

there's a lot of things going into why your pod is receiving requests, and depending on exactly what types of requests in flight are the problem and why they're forwarded to a dying pod still, there's probably some way to make traffic go to it less or not at all. sure, that's complex, but welcome to kubernetes, lots of things are complex and have defaults that you may not want

https://www.youtube.com/watch?v=0o5C12kzEDI&t=1m10s is good watch

Nomnom Cookie
Aug 30, 2009



CMYK BLYAT! posted:

"possible" is a strong word. there are some things that are indeed not possible unless you throw a whole lot of things out the window--if you want to exceed c and violate causality, for instance, some very fundamental things need to change, so that's de facto not possible. computers are usually a bit more flexible.

there's a lot of things going into why your pod is receiving requests, and depending on exactly what types of requests in flight are the problem and why they're forwarded to a dying pod still, there's probably some way to make traffic go to it less or not at all. sure, that's complex, but welcome to kubernetes, lots of things are complex and have defaults that you may not want

https://www.youtube.com/watch?v=0o5C12kzEDI&t=1m10s is good watch

possible is the right word. clusterip services are enabled by local state on each worker node, and it's not possible to determine from the apiserver's perspective whether a node is up to date, so it's not possible for a kubelet to know whether it's safe to kill a pod. it is literally impossible to get the semantics of elb:DeregisterInstancesFromLoadBalancer out of k8s, and adding sleeps doesn't fix the race. solving this problem was too hard for the brain geniuses making k8s to do, so they didn't, and you're not supposed to care, because you're supposed to be hosting garbage that no one will notice if it breaks occasionally, like google's thirtieth chat app

but thank you for talking down to me and linking me to a youtube that recapitulates the documentation at me while breathing into the mic. being told by some guy at a con that i need to do what i already did six months ago has fixed all of my problems

kitten emergency
Jan 13, 2008

get meow this wack-ass crystal prison

Nomnom Cookie posted:

possible is the right word. clusterip services are enabled by local state on each worker node, and it's not possible to determine from the apiserver's perspective whether a node is up to date, so it's not possible for a kubelet to know whether it's safe to kill a pod. it is literally impossible to get the semantics of elb:DeregisterInstancesFromLoadBalancer out of k8s, and adding sleeps doesn't fix the race. solving this problem was too hard for the brain geniuses making k8s to do, so they didn't, and you're not supposed to care, because you're supposed to be hosting garbage that no one will notice if it breaks occasionally, like google's thirtieth chat app

but thank you for talking down to me and linking me to a youtube that recapitulates the documentation at me while breathing into the mic. being told by some guy at a con that i need to do what i already did six months ago has fixed all of my problems

maybe your requirements are bad? idk, i don't do ecommerce or fraud detection so i trust that y'all have thought about this more than i have.

that said, like i was saying earlier my point isn't that "k8s is the best for everything ever", it's that the tradeoffs it asks you to make are generally reasonable. yeah, there's entire classes of application that maybe those tradeoffs don't work for due to kubelet architecture decisions, but there's also a lot where those tradeoffs _do_ make sense, and those happen to align with maybe 80% of the workloads that people deal with in the world.

most of the time, for most of the people, it's ok to retry.

Captain Foo
May 11, 2004

we vibin'
we slidin'
we breathin'
we dyin'


oh


my point stands though!!

Nomnom Cookie
Aug 30, 2009



uncurable mlady posted:

maybe your requirements are bad? idk, i don't do ecommerce or fraud detection so i trust that y'all have thought about this more than i have.

that said, like i was saying earlier my point isn't that "k8s is the best for everything ever", it's that the tradeoffs it asks you to make are generally reasonable. yeah, there's entire classes of application that maybe those tradeoffs don't work for due to kubelet architecture decisions, but there's also a lot where those tradeoffs _do_ make sense, and those happen to align with maybe 80% of the workloads that people deal with in the world.

most of the time, for most of the people, it's ok to retry.

can you understand me being pissed that the best practice guidance for this situation is to hack in a sleep long enough that you probably won’t have an issue, because making the thing work properly would be a “layering violation”. it’s worse is better pretending to be just plain better and it bugs the hell out of me

ate shit on live tv
Feb 15, 2004

by Azathoth
I dont' understand, you can't just put your k8s cluster behind a load balancer, then remove the cluster or node from the LB pool which will then no longer send requests to the cluster, then 30sec latter send a sigterm?

Cybernetic Vermin
Apr 18, 2005

lot of projects, and their requirements, don't care about scaling for various reasons. and i honestly don't see that much of a point to k8s if you don't have reasonably complex scaling needs. some people think it is good for management too, but i think there is a touch of stockholm syndrome happening there when not in a complex scaling scenario.

not caring about scaling is not the realm only of smaller-scale projects either. stock trading systems was already brought out, and the latencies involved are only part of the reason why k8s may not be ideal. the (entrenched in business) approach when faced with any deterioration of the environment is to halt the system (after some fixed failovers). trying to keep the system limping along, serving requests best-effort, while spinning up additional instances or whatever is sure to create some extremely unfair scenarios (one client having better access, or plain luck, than another). i.e.: the system doing its job worse is worse than it just not trying to do its job at all.

plus how difficult it is to make the situation not be extremely uncertain for the system participants.

Nomnom Cookie
Aug 30, 2009



ate poo poo on live tv posted:

I dont' understand, you can't just put your k8s cluster behind a load balancer, then remove the cluster or node from the LB pool which will then no longer send requests to the cluster, then 30sec latter send a sigterm?

in what universe is it sane to spin up a second cluster and write my own LB glue so that I can safely restart a single process. and that doesn’t address intracluster traffic at all

12 rats tied together
Sep 7, 2006

ate poo poo on live tv posted:

I dont' understand, you can't just put your k8s cluster behind a load balancer, then remove the cluster or node from the LB pool which will then no longer send requests to the cluster, then 30sec latter send a sigterm?

OP mentioned a ClusterIP service which is a intra-k8s-cluster-only sort of deal. the master nodes will basically allocate containers to servers, create virtual ips for collections of those containers, and then distribute a pepe silvia style iptables dnat spiderweb to each of the nodes in the cluster.

it means you can hit any node and it will get to your pod, but it also means the core nodes don't really have a consistent view of what traffic is currently flowing since the only thing they do is examine object definitions and push iptables rules to nodes. because they don't have a real view into traffic (there is no central nat table, for example, like in an SNAT load balancer), they can never be sure that traffic has stopped or that there are no more active tcp sessions or whatever.

the other types of services in standard use (NodePort, LoadBalancer) are built on top of ClusterIP so they share the same limitation. apparently there is an ExternalName service which uses cname-to-cluster-dns as a redirection layer instead of iptables, but thats probably even slower and less useful for OP's job

its a fair gripe, this app probably shouldnt be in k8s

Progressive JPEG
Feb 19, 2003

im not holding it wrong, YOUR holding it wrong!!!

Qtotonibudinibudet
Nov 7, 2011



Omich poluyobok, skazhi ty narkoman? ya prosto tozhe gde to tam zhivu, mogli by vmeste uyobyvat' narkotiki
clearly the solution is a service mesh

checkm8 mfers

ate shit on live tv
Feb 15, 2004

by Azathoth

12 rats tied together posted:


its a fair gripe, this app probably shouldnt be in k8s

i admit i have zero knowledge of k8s, but it sounds like its the wrong tool for the job. so maybe use something else, op?

when i was in adtech we had an sla of 10ms for certain requests and 50ms for others, we also did almost 1million requests per second and a hacked together node cluster that didnt understand network state wasnt part of our design.

Adbot
ADBOT LOVES YOU

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

Nomnom Cookie posted:

yeah, you’re not saying anything I don’t know. it’s not feasible to guarantee that requests never fail. I’m bitching about an aspect of k8s that guarantees some requests will fail, when they could have made different choices to avoid the failures. architectural purity is valued more highly than proper operation, and that pisses me off

the authors of kubernetes would probably say you’re going to have to solve this problem (requests dropping) eventually somewhere and also the problem you want them to solve is intractable. you’re asking them to somehow come up with a routing model that’s a) consistent/atomic across distributed nodes, b) supports dynamic scaling events and c) never dropping requests. how would you do this with any other provider or even conceptually other than a sleep after scaling down?

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply