Infra/networking thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > Infra/networking thread

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Makin this thread for those of us who are only programmer adjacent

talk about your artisanal k8s and docker setups or whether "the cloud" is really going to "take off"

also talk about putting packets in pipes and how you have to put in a major change request to scratch your own taint

catalyst for this thread was cisco announcing a technology that someone else has had out for multiple years as "the first ever"

# ¿ Dec 16, 2019 05:31

Adbot: ADBOT LOVES YOU

# ¿ May 8, 2024 05:01

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Kazinsal posted:

feelin' real called out right now

had to put in a change request the other day to get someone to pull an unplugged, powered off device from a rack, because it's in our HQ and we don't want to do anything that could possibly disturb the bean counters. our security architect caught wind of this, went into the closet, and just pulled the thing out at noon on a friday

what services will this effect?

the network

yeah but what services

# ¿ Dec 16, 2019 05:57

abigserve: Sep 13, 2009; this is a better avatar than what I had before

theadder posted:

computers seem bad op

Try connecting them together, that's when it really turns to poo poo

# ¿ Dec 16, 2019 08:34

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Spanning-tree is one of those things where it's technically very easy but it relies on a bunch of diligence at virtually every layer of the IT department so its rarely correct.

# ¿ Dec 17, 2019 05:31

abigserve: Sep 13, 2009; this is a better avatar than what I had before

CMYK BLYAT! posted:

this poo poo isn't half as annoying as helm rendering a template, failing to convert it to json, and then reporting an error on a line in the rendered yaml, which it doesn't show you.

powerful curse

# ¿ Dec 17, 2019 05:42

abigserve: Sep 13, 2009; this is a better avatar than what I had before

akadajet posted:

my managers all want us to use kubernetes but our software is 100% not designed for it lol

Can you expand on why it isn't? All I ever hear is "valid" k8s use cases but even under the smallest scrutiny it sounds like a lot of care and feeding for it to work. Be good to hear a case where it's the other side of the coin.

# ¿ Dec 18, 2019 03:24

abigserve: Sep 13, 2009; this is a better avatar than what I had before

CMYK BLYAT! posted:

windows software is perhaps a bit more of a special case, but for us, kubernetes manages to surface a lot of unfortunate shortcuts that didn't cause issues on dedicated VMs. these are probably all more just generic issues with adopting to containerized deployments, but k8s has made those more accessible:

* worker process count is determined based on core count by default. this doesn't work very well if you run on a beefy kubelet with many cores, but only allocate 2-4 CPU to the pod, since the "how many cores?" the program sees is the underlying host's core count. doubly so since these workers all allocate a baseline amount of RAM
* things that assume static IPs are poo poo in general in modern infrastructure, and kubernetes' pod lifecycle model demonstrates this quite well
* we have some temporary directories that default to a directory that also holds some static files. kubernetes makes it easy to do read-only root FS for security purposes, and while we have a setting to move the temporary files elsewhere, it turns out we hardcoded the default location loving everywhere

the largest issue, honestly, is that kubernetes operational experience is in fairly short supply, and there are a lot of people being dragged kicking and screaming into working with it because their higher-ups wanted to implement it (not without good reason, mind you, but in typical modern american corporate fashion, they want to do so without training anyone under arbitrary, too-short timelines). as vendor support for poo poo that runs in and heavily integrates with kubernetes, more than half my time ends up being spent on explaining poo poo that's covered in the kubernetes documentation and reminding people that "kubectl logs" and "kubectl describe" will explain the cause of most of their issues.

Ta, this makes sense. Definitely the static IP thing is something I've seen in prod, so this confirms that, but the worker process count is new and I'd have to do some research on it.

my stepdads beer posted:

has anyone moved from a network where your core speaks bgp to mpls where only your edge needs to? any pitfalls? I am tempted as the QFX range seems to be a bargain but won't take a full table

you still need iBGP or whatever other protocol to advertise the routes throughout the core to populate the LIB. MPLS doesn't replace a routing protocol it just changes the way the lookup is performed for a packet as it transits the network.

That doesn't mean you have to publish the full table into your core though, so I'd be considering why that was ever a requirement.

# ¿ Dec 18, 2019 06:09

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Bgp controlled by Linux daemons (quagga is popular) is a fairly standard workflow these days.

# ¿ Dec 18, 2019 10:33

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Yeah rrs peered to actual routers sorry, should have been clearer

# ¿ Dec 18, 2019 12:46

abigserve: Sep 13, 2009; this is a better avatar than what I had before

pointsofdata posted:

I feel like if you spent loads of figgies on hiring some really good k8s engineers to run it and tell you how to do everything it could be great

The problem, as I see it, with k8s, is that it's a total uplift of how apps are developed AND it's a total uplift of how apps are delivered.

Especially in the enterprise space, shitloads of stuff runs as black boxes (ovas), standalone installers in the case of windows environments or are static pages.

It's hard to demonstrate value when it's like "you could move this! oh...you can't. Alright, well what about this? Oh, you can't move that either. hrm."

# ¿ Dec 20, 2019 09:13

abigserve: Sep 13, 2009; this is a better avatar than what I had before

I think everyone is trying to avoid the vmware trap where every single person confuses hosts and guests, even those who spend a lot of time with vmware, because if you're talking about a server in literally any other context it is totally normal to call it a "host"

# ¿ Dec 21, 2019 02:35

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Captain Foo posted:

this doesn't confuse me op

You never had the "looks like the host is down. No, yeah I mean the vm. The guest. " Conversation?

# ¿ Dec 21, 2019 09:17

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Trimson Grondag 3 posted:

at scale you need to look at uplift of network too, google runs all their k8s stuff on CLOS network don�t they?

Contemporary DC network design relies on CLOS style ToR solutions, because all the vendors have abandoned huge DC chassis solutions and whitebox (commodity) switches are cheap.

K8s actually works very well in these designs, as the network is generally layer 3 to the edge and containers are very good at breaking the "layer 2" mindset that makes these designs...difficult, in real environments.

Problem is, unless you're a startup, you will likely be supporting a large amount of traditional systems as well, which makes the DC far more complicated than it used to be. DC network design aint easy these days, it used to be far simpler, where you'd whack a giant VSS pair of switches in the core and then some copper stuff for ilom and be gucci. You want layer 2? here it is! you want layer 3? here it is!!

# ¿ Dec 23, 2019 01:02

abigserve: Sep 13, 2009; this is a better avatar than what I had before

freeasinbeer posted:

or doing something real terrible where they fork processes on containers and don�t get why that�s bad.

Elaborate?

# ¿ Dec 26, 2019 04:28

abigserve: Sep 13, 2009; this is a better avatar than what I had before

It's been a while since I wrote anything that forked but I seem to remember getting into the zombie process state was hard as gently caress and you had to jump through a bunch of hoops to get there.

I.e if the parent process died or was killed, I remember (could be wrong) it's children being automatically reaped or killed along with it by default. This would have been perl. Am I crazy?

# ¿ Dec 26, 2019 09:28

abigserve: Sep 13, 2009; this is a better avatar than what I had before

You won't be able to push most consumer storage devices past 2gbps for actual real data transfers

# ¿ Dec 27, 2019 09:55

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Forums Medic posted:

let's talk eigrp - it sucks and I hate it

From the Before Time when CISCO legitimately thought they would never have competition

# ¿ Dec 28, 2019 00:22

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Fundamentally there is no way to move to a "rest like" protocol for the functionality that RADIUS provides, because the primary function of RADIUS is to carry EAP messages. EAP messages are layer 2 only and typically are not forwarded past the switchport which means you need another protocol whose only role is a carrier for said messages to the layer 3 endpoint that serves as the AAA server. Because EAP is end-to-end between the client and the authentication server for obvious reasons, there is no plausible way you could lift a framework like OIDC into the role that RADIUS provides.

If you're thinking "why not use something other than EAP", consider that your clients have literally no network access at all prior to authentication. That is the primary use case for EAP/RADIUS.

The end of RADIUS is actually the end of traditional networking, which is the very slow, plodding shift for enterprise to move towards zero-trust networking via massive overlay networks, the standards for which are still not agreed upon let alone implemented.

# ¿ Dec 29, 2019 07:58

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Cocoa Crispies posted:

it's not the network client to radius client thing that people want rest, it's the radius client to radius server part, and i bet you could design it so your authentication/accounting services would work in an edge cloud or something because *faaaart*

RADIUS is just a transport around EAP, as far as authentication is concerned. You could potentially replace radius authorisation & accounting with another protocol and, in fact, some solutions do this (some VPN' solutions use Radius to auth the user, then LDAP to then map that user to groups, for example).

What you could do, is make the NAS devices capable of authenticating users by having them parse and respond to EAP messages, as opposed to forwarding them to a AAA server, which then lets you do whatever the gently caress you want, and many solutions like that do exist out there. Contemporary VPN solutions, for example, often support SAML, LDAP, certificate based auth, etc without the use of RADIUS.

# ¿ Dec 30, 2019 00:59

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Nomnom Cookie posted:

fuuuuuck palo alto

e: palo alto also uses ldap for group mapping when your auth method supports returning group membership, e.g. saml. every single thing about the product is like that

nah they'll map groups onto fuckin anything if there is a username in the db and group mapping is configured. You can do it that way too though, where the authentication server returns the group membership but then it depends on the protocol (RADIUS/SAML/whatever)

# ¿ Dec 30, 2019 01:23

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Turnquiet posted:

my radius beef is when apps try to use if to user authn. the stuff i build with my stupid fartastic cloud deployments is premised upon ephemeral infrastructure, so when loving cyberark wants to use radius for user auth and my pam guy just assumes i will turn on my pingfederate's radius capabilities i get irked because i am stupidly trying to get us into the cloud like what the last 8 years of c-levels have said is the strategy, which is a place where we can't guarantee a fixed ip address. gently caress you, in preparation for zero trust we are using federated protocols for all authentication, so use saml2 or oidc or die in a fire- looking at you, microsoft, who literally built azuread on openid flows but still demands ws-trust/ws-fed if you want to retain control of your idp.

there is no legitimate use case for RADIUS for user auth on web apps aside from "we already have radius servers", that's dumb as hell (yes I am aware that most MFA providers use on-prem radius proxies to insert MFA into auth flows that would otherwise not be supported)

# ¿ Dec 30, 2019 03:26

abigserve: Sep 13, 2009; this is a better avatar than what I had before

my stepdads beer posted:

we use RADIUS for PPPoE because cisco's IPoE is buggy af on one of the agg routers we use and inertia. sorry about your 8 bytes of overhead everyone.

PPPoE is real good for certain situations and realising you can functionally operate as an ISP for stuff like tenants and student accommodation is baller as hell

# ¿ Dec 30, 2019 07:19

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Bored Online posted:

in all of yospos this thread most closely aligns with my profession and it js also the thread i understand the least in

Proast about what you are doing then

Infrastructure and networking Jobs cum in different shapes and sizes but one way or another they are all terrible

# ¿ Dec 30, 2019 22:50

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Jimmy Carter posted:

my wacky local ISP double-NATs me but will sell me a public IPv4 address for $5/mo.

when I asked to just get an IPv6 allocation they told me that they aren't there yet, but I could save money and get a NordVPN account for $3/mo.

More like IPv6000 years to implement!!

We had a full ipv6 dual stack deployment at a relatively large place and it legitimately didn't cause many issues and any they did were purely server/client implementation related. Why an ISP wouldn't already provide it I have nfi.

# ¿ Mar 2, 2020 08:34

abigserve: Sep 13, 2009; this is a better avatar than what I had before

fill a small nas with 'em. I assume for home use, you can get a mini-itx case with like 8 drive slots (at least, there's probably even bigger ones)

# ¿ Apr 14, 2020 12:33

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Clark Nova posted:

With SSDs you can just leave 'em banging around loose inside the case The absolute cheapest option would be whatever PC you have + a Dell PERC H310 (or some other raid card that has or can be flashed to JBOD mode) + a rat's nest of SSDs

cat 'o nine tails but with sata cables and ssds.

In infrastructure news I've been forced to use GCP for some things at work and it's not bad? Especially app engine, I like app engine a lot, it seems to just "make sense".

# ¿ Apr 20, 2020 00:35

abigserve: Sep 13, 2009; this is a better avatar than what I had before

klosterdev posted:

Literally made one of these with Ethernet cable and phone heads

whip a switch with it and post it to onlyfans

# ¿ May 25, 2020 05:17

abigserve: Sep 13, 2009; this is a better avatar than what I had before

I like the idea that an application has all its services linked together dynamically over the network and it solves a lot of problems but whether it's easier to live with than a properly maintained LB/DNS configuration remains to be seen imo

# ¿ May 27, 2020 10:18

abigserve: Sep 13, 2009; this is a better avatar than what I had before

ate poo poo on live tv posted:

all the services in a service mesh are unicast P2P right? if so why not just use ssl for each connection? i'm not seeing the advantages or even the difference in a service mesh compared to just a server that has ports opened and uses ssl to authenticate

e: i guess i see the advantage. it allows service-to-service communication to scale horizontally, and also provides snowflake service developers a structured way to integrate new datasources/services or allow others to access their snowflake services. but then you absolutely need a dedicated team to create and manage the service mesh, while also empowering them to enforce standards on access/queries to service mesh. basically you need your good developers to build that instead of building your revenue generating app.

A big part of it is trying to build more autonomy into the app (or at least the systems running the app) and the last frontier of that is effectively DNS, LB and network layer security

the amount of people who actually need this is probably extremely small tbh but then I'd argue the exact same thing about k8s

# ¿ May 28, 2020 00:04

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Jimmy Carter posted:

OkCupid ran their entire site on 5 servers in 2012 how did we stray so far from the god's light

yeah sure it ran on 5 but could it scale to 1000???!?!?

# ¿ May 31, 2020 05:16

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Running one server is easy - running a thousand servers that all do something different is hard and that's why we are where we are

# ¿ Jun 3, 2020 15:00

abigserve: Sep 13, 2009; this is a better avatar than what I had before

I think you won't find that many people arguing against cloud delivered k8s

On prem? Different story

# ¿ Jun 9, 2020 22:17

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Nomnom Cookie posted:

when I said realtime earlier I meant we have a 250ms deadline. request comes in from merchant, we do as much fraud detection as we can in that window, return a decision. if we hit 230-240ms without reaching a conclusion then we return a random decision. that leaves very little room for timeouts and retries. obviously there are ways to decouple processing from the success or failure of a single http request, but they�re not applicable here. we get sent one request and it�s our one chance to do something reasonable. getting the client to retry isn�t possible because waiting for us is blocking them taking payment and showing a spinner in the customer�s browser

these are very harsh constraints! sometimes poo poo doesn�t work right and we fall back to random or even time out. poo poo happens and you just have to accept that. it�s just aggravating as gently caress to discover that k8s is built for workloads that no one really gives a poo poo about except at large scales. we do actually want to avoid chopping off those dozen requests if we can, because high-value merchants are watching that success rate very closely, and this flies directly in the face of everything k8s stands for

Seems like you need middleware (or better? middleware) between the client and the backend to buffer the request and determine if the backend is hosed

With that said you still won't ever get fully around the issue because "sometimes HTTP requests will fail and that is ok" is basically the definition of 2020 backend web development

# ¿ Jun 11, 2020 00:37

abigserve: Sep 13, 2009; this is a better avatar than what I had before

Captain Foo posted:

just yeet your packets to nullroute, who cares

tired: static default route
wired: static default null route

# ¿ Jun 11, 2020 23:40

abigserve: Sep 13, 2009; this is a better avatar than what I had before

routing packets through linux hosts still kinda sucks poo poo and there's a reason vendors like cumulus or whatever charge 2k+ for software per device

# ¿ Jun 11, 2020 23:41

abigserve: Sep 13, 2009; this is a better avatar than what I had before

12 rats tied together posted:

except maybe the Cisco (R) Catalyst (Copyright 1995) Content Switching Module

I once had to replace a pair of these with F5's and one of the business owners legitimately argued with me that I shouldn't do that because the risk of breaking production was too great

the css at the time was end of support

# ¿ Jun 12, 2020 22:45

abigserve: Sep 13, 2009; this is a better avatar than what I had before

They actually worked totally fine and the config was easy to understand. Far more straightforward than netscalers, I hated managing those.

# ¿ Jun 13, 2020 03:22

abigserve: Sep 13, 2009; this is a better avatar than what I had before

mod saas posted:

does f5 stand for �just keep refreshing until it finally points you to a server that�s actually alive� or just the two instances i have to rely on were configured by clowns?

Any load balancing solution requires the person operating it to have a beyond-cursory understanding of the apps they are load balancing and that's an unreasonable request for a network team that may have to look after several thousand virtual servers so you get a lot of "tcp port alive" health checks and poo poo like that

# ¿ Jun 13, 2020 03:40

abigserve: Sep 13, 2009; this is a better avatar than what I had before

It seems like the 6500 era was when Cisco had all the good engineers, who then left to form competitors. I can't think of a single bad thing to say about that platform.

# ¿ Jun 17, 2020 01:19

Adbot: ADBOT LOVES YOU

# ¿ May 8, 2024 05:01

abigserve: Sep 13, 2009; this is a better avatar than what I had before

I think the nexus 7k was the first platform that ruffled everyones feathers. It was such a dramatic architecture change from the 6500 it was literally "we're going to deprecate this good, working platform, and replace it with something far inferior for no reason"

Like sure, on paper the platform was far more performant, it was much closer aligned with the needs of the DC, but in practice most customers don't need twenty petabits of backblane throughput but what they DO need is l3vpns that work, and a solid BGP implementation, a working HA model, etc etc etc

And the thing is, they never really fixed it. That legacy is still around, albeit in the 9K form factor now.

# ¿ Jun 17, 2020 13:40

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > YOSPOS > Infra/networking thread

«‹›2 »