Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Internet Explorer
Jun 1, 2005





Had a client that was doing a datacenter move and IBM insisted they move the NetApp SAN that was bought through them or they wouldn't support it. They installed the SAN backwards in the rack at the data center.

Adbot
ADBOT LOVES YOU

TheCog
Jul 30, 2012

I AM ZEPA AND I CLAIM THESE LANDS BY RIGHT OF CONQUEST

siggy2021 posted:

Sometimes you guys start talking and I have no idea what the gently caress is going on and at this point I'm convinced it's all one big joke I don't understand.

Ranchers? Cattle? "STEERing" team? These are all just puns and not real things, right?

There's an idea in devops that servers should all have configuration managed by one central source, rather than each box having a bespoke configuration that some developer or team of developers has tweaked for years, and cannot be replicated. This is sometimes referred to as cattle vs pets, with the idea being your servers should be disposable cattle you can just use interchangeably and not pets which you feel some affection for and can't be replaced. This idea is especially important when you start working at scale, on the cloud or in highly virtualized environments.

Rancher is a tool that works on these principles to let you manage containerized infrastructure

I have no idea what STEERing is.

The Fool
Oct 16, 2003


Steering committees are a meat space management concept.

It's basically a group of people getting together to make bad decisions about things they don't understand.

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

The Fool posted:

Steering committees are a meat space management concept.

It's basically a group of people getting together to make bad decisions about things they don't understand.

Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity)

Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results. When that container has served its purpose it gets murdered and all stateful info is discarded. This allows a brand new container tomorrow from the same source file which ensures the job runs identically every time with no weird quirks accumulated over time.

CPColin
Sep 9, 2003

Big ol' smile.

The Fool posted:

Steering committees are a meat space management concept.

It's basically a group of people getting together to make bad decisions about things they don't understand.

And in the end, everybody's balls are cut off?

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Judge Schnoopy posted:

Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity)

Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results. When that container has served its purpose it gets murdered and all stateful info is discarded. This allows a brand new container tomorrow from the same source file which ensures the job runs identically every time with no weird quirks accumulated over time.
until someone changes the configuration on your orchestration system and the job mysteriously stops being scheduled for reasons that take you four days to figure out

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

Vulture Culture posted:

until someone changes the configuration on your orchestration system and the job mysteriously stops being scheduled for reasons that take you four days to figure out

well, this is still IT, no getting around that.

but it's better IT.

CPColin
Sep 9, 2003

Big ol' smile.

Judge Schnoopy posted:

Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results.

This is a better strategy than the one my predecessor used, where the scheduled job lives on the desktop machine they remote desktop into. I don't even know where this machine physically is. I'm pretty sure nothing is monitoring its heartbeat, though!

Tetramin
Apr 1, 2006

I'ma buck you up.

Antioch posted:

We got hit by LockerGoga a couple weeks ago. Took us entirely by surprise, had most of our poo poo cryptoed to the nines before we even started getting alerts. Couple days of 12-14 hours, some tense restores and all is mostly well again. Plus a week of uninterrupted downtime was a great opportunity to upgrade almost everything, from server 2016 to the wiring in the switch closet to replacing an old firewall and cleaning up rules.

And get this. We were originally going to be "paid" in time in lieu, but an offhand comment about "also working for money" that was overheard by the CEO fell on sympathetic ears, and we all got actual honest to god cash bonuses, a full week's pay at 1.5x. Actual real money!

I'm going to make "I survived Crypto Crisis 2019" stickers and t-shirts.

We are currently on the 3rd straight day of recovering from ryuk. I’m on the network team so there’s luckily not a whole lot for me to do on this, but yeah day 1 was about 16 hours for me, day 2 was 12, we’ll see about today. Critical services are mostly restored so we’re getting there, but this has been wild. We paid out the rear end to a security contractor to help with this and they’ve been worth every penny to help plan and keep things on track. My manager put in like 36 hours straight from when it started lol

No talk of how we will be compensated for the extra time yet but I’m sure it’ll just be some extra time off

CloFan
Nov 6, 2004

GreenNight posted:

Yup. What bullshit you run into?

SPSS. The process is just super vague and cryptic, and there's no document explaining how it works. A lot of their support pages just 404 since they've changed processes a few times, or sometimes they give you information that is not valid and send you on a wild goose chase. Once you finally have the license code applied to the server, you got to change environment variables to set up commuter licenses, logging, that sort of thing.

This particular case, I couldn't find out why one module wasn't included in the license even though the base product and other extra modules were present. Turns out this authorization code has to be generated from an entirely different portion of the website! I ended up making a Sev 1 ticket and getting them to list step by step instructions because gently caress poking around on a garbage website for hours trying to divine out why Regression is so special and coveted that it can't be included with the main package.

The server side of this, Sentinel RMS sucks too but at least there's documentation out there for it.

Antioch posted:

I'm going to make "I survived Crypto Crisis 2019" stickers and t-shirts.

:ohdear: don't jinx yourself!

siggy2021
Mar 8, 2010

TheCog posted:

There's an idea in devops that servers should all have configuration managed by one central source, rather than each box having a bespoke configuration that some developer or team of developers has tweaked for years, and cannot be replicated. This is sometimes referred to as cattle vs pets, with the idea being your servers should be disposable cattle you can just use interchangeably and not pets which you feel some affection for and can't be replaced. This idea is especially important when you start working at scale, on the cloud or in highly virtualized environments.

Rancher is a tool that works on these principles to let you manage containerized infrastructure

I have no idea what STEERing is.

That is a good explanation and I understand more now. As someone that doesn't touch development stuff, and has never had a good reason to look into containers and whatnot, it all just sounds hilarious when you are throwing around terms like rancher and cattle.

Really I just wanted to make a bad Steer pun.

Methanar
Sep 26, 2013

by the sex ghost

Judge Schnoopy posted:

Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity)

Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results. When that container has served its purpose it gets murdered and all stateful info is discarded. This allows a brand new container tomorrow from the same source file which ensures the job runs identically every time with no weird quirks accumulated over time.

What if my poo poo isn't stateless and also isn't http and also needs to receive traffic from the internet

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

Methanar posted:

What if my poo poo isn't stateless and also isn't http and also needs to receive traffic from the internet

"is also great when you need a thing to run once a day at a defined time"

clearly your use case isn't what I was talking about in that scenario. BUT! If your poo poo isn't stateless, it should publish state to a database so the server itself doesn't have to track anything. And your cluster of servers should sit behind a load balancer so any of them can die and be reborn without any interruption to receiving traffic from the internet.

At that point you'd want them on a rolling cycle of being killed while maintaining enough nodes in the cluster to handle the workload, and no container should be over a day or two old.

Methanar
Sep 26, 2013

by the sex ghost

Judge Schnoopy posted:

"is also great when you need a thing to run once a day at a defined time"

clearly your use case isn't what I was talking about in that scenario. BUT! If your poo poo isn't stateless, it should publish state to a database so the server itself doesn't have to track anything. And your cluster of servers should sit behind a load balancer so any of them can die and be reborn without any interruption to receiving traffic from the internet.

At that point you'd want them on a rolling cycle of being killed while maintaining enough nodes in the cluster to handle the workload, and no container should be over a day or two old.

How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate.

Methanar fucked around with this message at 20:14 on Apr 4, 2019

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

Methanar posted:

How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot.

UDP that requires hitting a unique endpoint as if it was stateful even though it's not stateful?

That's a huge design flaw in whatever you're running and yeah, it's going to greatly limit what you can do with your infrastructure. That doesn't make containers bad, though.

stuxracer
May 4, 2006

I think most people just deal with defending against containers like every other word published in CTO magazine or whatever that hearing it sets them off. Like I have no issue with “cloud” but every time someone asks why we aren’t doing it I twitch a little.

12 rats tied together
Sep 7, 2006

Methanar posted:

How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate.

what is a udp "session"?

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together posted:

what is a udp "session"?

webrtc

abigserve
Sep 13, 2009

this is a better avatar than what I had before
The idea of a single configuration entity is bad imo because as mentioned it becomes monolithic.

The ideal model is one in which each application has the ability to bootstrap itself onto the infrastructure based on the same codebase as the application itself.

The way I have moved at work is to small deploy scripts that are kicked off via CD but the docker model fits this as well.

Methanar posted:

How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate.

You want GSLB.

22 Eargesplitten
Oct 10, 2010



Home network question: Is there a better way to set up wake-on-lan than activating whatever setting it is on my computer and then memorizing the MAC address, then running start-computer from my other computer? I'm doing some resource-intensive stuff on my desktop but would like to be able to work on it from my laptop out of the room. Works fine through Chrome Remote Desktop until I stop for a while, the computer falls asleep, and I have to get up and go into the other room to wake it back up.

I'm open to using traditional RDP, it's just that Chrome is super convenient.

Thanks Ants
May 21, 2004

#essereFerrari


Get a cordless mouse and move it into the room you're in, wiggle it when you need to wake the PC.

Digital_Jesus
Feb 10, 2011

Don't turn your PC off or let it sleep?

nullfunction
Jan 24, 2005

Nap Ghost

12 rats tied together posted:

what is a udp "session"?

A miserable little pile of packets

E: missed this but

abigserve posted:

You want GSLB.

No no no no no no. Nooooo

nullfunction fucked around with this message at 23:11 on Apr 4, 2019

abigserve
Sep 13, 2009

this is a better avatar than what I had before

nullfunction posted:

A miserable little pile of packets

E: missed this but


No no no no no no. Nooooo

? Expand?

GSLB is extremely widely deployed and it works very well as long as you aren't looking for highly granular load balancing OR if your clients are extremely geo-diverse (I think this would be Methanars use case)

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

abigserve posted:

The idea of a single configuration entity is bad imo because as mentioned it becomes monolithic.

The ideal model is one in which each application has the ability to bootstrap itself onto the infrastructure based on the same codebase as the application itself.

The way I have moved at work is to small deploy scripts that are kicked off via CD but the docker model fits this as well.

Docker has this concept of images 'using' some image that's already there. So if I have a web app that will run a specific config, I start my docker file saying "using lamp.stack.image" and then tell it how to download and install my web app configs.

Somebody else can also say "using lamp.stack.image" and apply their own config, or they can say "using new.app.config" and bootstrap from where I left off.

This provides a ton of flexible forking of images to not only get exactly what you want, but to borrow as much code as possible. It definitely keeps you from getting monolithic images that everybody uses as the 'gold standard', containing poo poo-tons of bloat your project won't ever touch.

Docjowles
Apr 9, 2009

nullfunction posted:

A miserable little pile of packets

:golfclap:

perhaps the same could be said of ALL protocols

nullfunction
Jan 24, 2005

Nap Ghost

abigserve posted:

? Expand?

GSLB is extremely widely deployed and it works very well as long as you aren't looking for highly granular load balancing OR if your clients are extremely geo-diverse (I think this would be Methanars use case)

There's something a little more subtle that we ran into that caused a lot of problems. Keep in mind that GSLB is really just DNS at its core.

Your users' ISP probably handles resolving their DNS requests, but if their ISP has a geodistributed DNS infrastructure, or relies on a third party to resolve their requests who has DNS servers in many locations, your DNS requests can appear to be coming from all over the place. GSLB sees this and responds with a different IP address -- but this often happens in the middle of your session! Did you know that background DNS requests fire off while you're connected to a website, every few seconds? They nuked chrome://net-internals in more recent versions, but it was extremely plain to see a few versions ago. Sometimes DNS traffic will just start coming out of a server across the country, and if your application can't accommodate that, you're going to have a bad time.

If you have long-running sessions (6 hours definitely counts) your load balancers have to maintain that state in memory. That's a lot of hardware if you have a high-volume service. If your application is completely stateless (unlikely, state is pretty much unavoidable for nontrivial use cases) or you've found a way to share state across datacenters (can be done, but is often incredibly expensive and/or slow) then GSLB might be a solution for you.

I spent a good chunk of 2017 and 2018 making our application work with GSLB without randomly dropping sessions due to cross-DC chatter, and it's still not perfect. It probably won't ever be, but we got it down about an 0.04% failure rate. "Granular" in our case was US-East vs. US-West. Our application is performance-sensitive and sharing state between DCs isn't possible within our performance constraints at the scale we operate at. We even tried implementing edns-client-subnet extensions without much improvement.

I know a little bit about Methanar's use case from this thread and PMs and I can say pretty confidently that GSLB is not the right tool for that job. GSLB works for some scenarios, but it's far from a silver bullet.

PBS
Sep 21, 2015

Methanar posted:

lol how do you feel about getting internet traffic into the cluster

I don't think we'll ever be getting direct traffic, there will always be some perimeter. Ultimately customer facing products will be running in the environments we build.

abigserve
Sep 13, 2009

this is a better avatar than what I had before

nullfunction posted:

GSLB sees this and responds with a different IP address -- but this often happens in the middle of your session! Did you know that background DNS requests fire off while you're connected to a website, every few seconds? They nuked chrome://net-internals in more recent versions, but it was extremely plain to see a few versions ago. Sometimes DNS traffic will just start coming out of a server across the country, and if your application can't accommodate that, you're going to have a bad time


I don't think any of that is right and I couldn't replicate it in wireshark...

I'm not saying GSLB is a silver bullet but for long running UDP applications like webrtc that presumably front-load all of the DNS-lookups I think that's a pretty good fit?

Dr. Arbitrary
Mar 15, 2006

Bleak Gremlin

Judge Schnoopy posted:

At that point you'd want them on a rolling cycle of being killed while maintaining enough nodes in the cluster to handle the workload, and no container should be over a day or two old.

Uptime Epic Fail Compilation

12 rats tied together
Sep 7, 2006

I was going to say do DSR with a really intelligent hashing algorithm that you will probably have to implement yourself as an nginx module. Several hundred sessions per backend doesn't sound high enough to make DSR a requirement, though!

what, specifically, is breaking? how is this any different from load balancing a bunch of TCP connections?

evobatman
Jul 30, 2006

it means nothing, but says everything!
Pillbug

Methanar posted:

How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate.

Very carefully.

mllaneza
Apr 28, 2007

Veteran, Bermuda Triangle Expeditionary Force, 1993-1952




CloFan posted:

:ohdear: don't jinx yourself!

Make them "I survived Crypto Crisis 2019 1.0" and they're evergreen.

22 Eargesplitten posted:

Home network question: Is there a better way to set up wake-on-lan than activating whatever setting it is on my computer and then memorizing the MAC address, then running start-computer from my other computer?

Make the computer do the memorization and save it as a .ps1 file. Just run that whenever you need to wake up the remote system.

Our PortsDB is full of bad data so we can't look up a wall jack by IP address, but by god we can turn a hostname into a MAC address ! I have someone else's script to look up MACs from hostname, and I wrote a script to take that output and send WoL packets to a list of hostnames. Handy for late night deployments.

nullfunction
Jan 24, 2005

Nap Ghost

abigserve posted:

I don't think any of that is right and I couldn't replicate it in wireshark...

I'm not saying GSLB is a silver bullet but for long running UDP applications like webrtc that presumably front-load all of the DNS-lookups I think that's a pretty good fit?

I worked on this from about the Chrome 58 to 69 (nice) timeframe, and maybe the behavior has changed, as I can't replicate it now either. :shrug: It could have been tied to TTLs, ours have to be quite low in order to support fast failover to meet our SLAs.

I ended up writing scripts to scrape tens of millions of rows of application audit logs looking for the crosstalk, pulling user-agent strings, tying the IP addresses to an ISP, and then geolocating them in the course of tracking this down. It was the worst on Chromium-based browsers and we also found a strong correlation to satellite internet connections, and for some larger customers that ran their own DNS infrastructure we caught several misconfigurations on their end in the course of investigating all of this. It was A Thing for a long time and I'll admit I'm a little bit apprehensive when GSLB-anything comes up because of it. I know "it's DNS" is kind of the poo poo answer people sometimes give when they have nothing else, but after a deep dive that's... what we found. It was DNS.

Your success with GSLB is also likely tied to your vendor's implementation, ours being not great. We ended up switching to an external DNS provider to handle the traffic direction which allowed us to get our failure rate to where it is today (orders of magnitude better than the first time around).

If your application can tolerate it, GSLB might work for you. There are a lot of applications that can't, and you will really need to know your stack inside and out (especially how state is handled) to evaluate it properly.

hihifellow
Jun 17, 2005

seriously where the fuck did this genre come from

nullfunction posted:

There's something a little more subtle that we ran into that caused a lot of problems. Keep in mind that GSLB is really just DNS at its core.

Your users' ISP probably handles resolving their DNS requests, but if their ISP has a geodistributed DNS infrastructure, or relies on a third party to resolve their requests who has DNS servers in many locations, your DNS requests can appear to be coming from all over the place. GSLB sees this and responds with a different IP address -- but this often happens in the middle of your session! Did you know that background DNS requests fire off while you're connected to a website, every few seconds? They nuked chrome://net-internals in more recent versions, but it was extremely plain to see a few versions ago. Sometimes DNS traffic will just start coming out of a server across the country, and if your application can't accommodate that, you're going to have a bad time.

If you have long-running sessions (6 hours definitely counts) your load balancers have to maintain that state in memory. That's a lot of hardware if you have a high-volume service. If your application is completely stateless (unlikely, state is pretty much unavoidable for nontrivial use cases) or you've found a way to share state across datacenters (can be done, but is often incredibly expensive and/or slow) then GSLB might be a solution for you.

I spent a good chunk of 2017 and 2018 making our application work with GSLB without randomly dropping sessions due to cross-DC chatter, and it's still not perfect. It probably won't ever be, but we got it down about an 0.04% failure rate. "Granular" in our case was US-East vs. US-West. Our application is performance-sensitive and sharing state between DCs isn't possible within our performance constraints at the scale we operate at. We even tried implementing edns-client-subnet extensions without much improvement.

I know a little bit about Methanar's use case from this thread and PMs and I can say pretty confidently that GSLB is not the right tool for that job. GSLB works for some scenarios, but it's far from a silver bullet.

I work on appliances that do GSLB. There's a reason I (and many other people I know who work on the same appliance) don't use response time based metrics.

Digital_Jesus
Feb 10, 2011


Our core switch was up for 4 years, 17 Weeks before I rebooted it after getting hired.

nullfunction
Jan 24, 2005

Nap Ghost

hihifellow posted:

I work on appliances that do GSLB. There's a reason I (and many other people I know who work on the same appliance) don't use response time based metrics.

Out of curiosity, is there a better solution? Here's our scenario:

We have a pair of datacenters and want to ensure that we balance the load roughly equally between the two, but once a customer establishes a connection to one, they need to stay there for the life of their session or until their environment in that datacenter becomes unhealthy, whichever comes first. Our environments are sized so that we can take a full outage of one without overloading the other as we'd like to get to the point where we can fully shut down one DC for maintenance with minimal customer impact, but Active/Active is a selling point. Right now we're using geolocation to direct traffic but it presents problems when a customer's user base is right on the arbitrary lines we've drawn, so our guys are having to create a bunch of custom rules in our geolocation provider to deal with those. The user base is a mixture of desktop and mobile and the mobile users tend to move around a lot.

It's not possible to share running application state between DCs due to the way our backend works, and when a failure is detected we need people to fail over as quickly as possible, 5 minutes is about the longest we can wait -- our SLAs don't leave a ton of wiggle room.

I didn't design our current solution, my background is in networking but I'm not the one touching the hardware directly. I know we're moving to F5 LBs in the coming months, if that makes a difference.

Feel free to PM if you'd rather. Someday I hope to be able to post more of the gory details as some of it is really interesting, but I like my job too much to do that now.

Sepist
Dec 26, 2005

FUCK BITCHES, ROUTE PACKETS

Gravy Boat 2k
Are source/destination hashes for load balancing an option instead of round robin? Or isnt there something called sticky sessions. My load balancing knowledge is a little ancient

nullfunction
Jan 24, 2005

Nap Ghost
The first thing we tried was sectioning off the IP space and directing traffic based on source, but we found that it disproportionately affected mobile as people would go on and off WiFi and the change in address would cause them to hit the other datacenter and kick them because their session wasn't valid in that DC. It also lead to a fairly lopsided load pattern which was undesirable.

Using a cookie to tell which DC they started in was one of my suggestions, but I was told there wasn't a good way to reroute their traffic along our edge to the correct DC if they came in through the wrong one. Probably a vendor-specific limitation, but I don't know the details there.

Adbot
ADBOT LOVES YOU

hihifellow
Jun 17, 2005

seriously where the fuck did this genre come from
I haven't worked with F5's before but they should be able to load balance by session count with persistence linked to something like source IP. I know they compete with Netscalers, which is what I do work on and I know they can do exactly that.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply