Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
How many quarters after Q1 2016 till Marissa Mayer is unemployed?
1 or fewer
2
4
Her job is guaranteed; what are you even talking about?
View Results
 
  • Post
  • Reply
Motronic
Nov 6, 2009

Arsenic Lupin posted:

Tell me what a BGP is?

So to follow on with "DNS is how your computer/phone turns facebook.com into an IP address, which is the thing your computer needs to contact them", BGP is the thing that a router(s) at facebook uses to go "hay! Your other router that I'm connected to! I know how to get to <these IP addresses>!" and then that router tells other routers, etc so all the routers on the internet can figure out how to get there.

So basically one level lower on the "how do you find thing on the internet" than DNS.

Adbot
ADBOT LOVES YOU

RFC2324
Jun 7, 2012

http 418

Motronic posted:

So to follow on with "DNS is how your computer/phone turns facebook.com into an IP address, which is the thing your computer needs to contact them", BGP is the thing that a router(s) at facebook uses to go "hay! Your other router that I'm connected to! I know how to get to <these IP addresses>!" and then that router tells other routers, etc so all the routers on the internet can figure out how to get there.

So basically one level lower on the "how do you find thing on the internet" than DNS.

maybe facebook is pre-emptively trying to fix the full bgp table issue

His Divine Shadow
Aug 7, 2000

I'm not a fascist. I'm a priest. Fascists dress up in black and tell people what to do.

Doggles posted:

Maybe :tinfoil: but I wonder if someone on the inside at Facebook took matters into their own hands after last night's 60 Minutes interview.

https://twitter.com/AP/status/1444829939090567168

Makes me feel pretty drat good if some guy with morals can do that much damage to these bags of poo poo.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Motronic posted:

So to follow on with "DNS is how your computer/phone turns facebook.com into an IP address, which is the thing your computer needs to contact them", BGP is the thing that a router(s) at facebook uses to go "hay! Your other router that I'm connected to! I know how to get to <these IP addresses>!" and then that router tells other routers, etc so all the routers on the internet can figure out how to get there.

So basically one level lower on the "how do you find thing on the internet" than DNS.

This. DNS is an address book, it tells your router/computer where a Domain lives and what IP is associated with it, but DNS only has the entries, it needs routes to get there (like a map). BGP fills in that hole by propagating routes to things, so that once the DNS associates a domain with an IP, your router can find its way there through shared routes that BGP provides as a map to get back to them.

Its a big game of telephone for instructions on how to get from Point A (your computer or network) to point B (the servers computer or network) and what turns to make along the way.

Inspector Gesicht
Oct 26, 2012

500 Zeus a body.


How long will this take to fix? Not that I want it too, but I have to be realistic.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Inspector Gesicht posted:

How long will this take to fix? Not that I want it too, but I have to be realistic.

Depends on what happened? If it was just automated changes, you gotta roll those back and redeploy the correct stuff. That assumes you HAVE everything that got changed documented.

May take an hour, could take days. And we still don't know WHAT happened, which is critical to knowing how they will address it.

Motronic
Nov 6, 2009

Inspector Gesicht posted:

How long will this take to fix? Not that I want it too, but I have to be realistic.

All we can observe right now are symptoms, not causes. I've found zero information on what actually happened. So there's no saying.

This could be config automation gone wrong and they've got a whole mess to sort out to fix it, it could be something broke and "whoops, that wasn't actually redundant/we never tested this and the redundancy blew up when it switched over" or.......the reason I'm hoping for.

Detective No. 27
Jun 7, 2006

Honestly don't see how this qualifies for this thread as it's a tech Christmas miracle come early.

Walh Hara
May 11, 2012
This was posted in another thread:

jaete posted:

Someone somewhere on irc was saying that apparently Facebook managed to lock their sysadmins out of their networks with this, so yeah like keys in the car as was said.

Not sure how accurate all this is but apparently none of their sysops people are physically near a data centre, while the data centre people don't have the DNS knowledge etc required to fix, so it's taking a while

e: dunno about outsourced support but doesn't necessarily need to be outsourced to achieve this, just incompetence is enough :v:

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Detective No. 27 posted:

Honestly don't see how this qualifies for this thread as it's a tech Christmas miracle come early.

Our tech dream is someone else's tech nightmare. And Facebook sucks and all, but there are a lot of people that depend on FB Messenger for being able to reach people. As gleeful as this makes us, its someone's nightmare.

Walh Hara posted:

This was posted in another thread:

Yeah, take it with a big grain of salt, but not the first time I've had stuff like this happen to clients, FB isn't that special in that regard if its true.

Dr. Arbitrary
Mar 15, 2006

Bleak Gremlin

Inspector Gesicht posted:

How long will this take to fix? Not that I want it too, but I have to be realistic.

A lot of it is rumor mill stuff, but apparently the fix requires physical access, and the people who have physical access and the people who know how to fix it are not the same people.

Motronic
Nov 6, 2009

Walh Hara posted:

This was posted in another thread:

I'm not saying that impossible, but it's implausible.

Not to get too far into the weeds, but there are two problems with this: 1.) Almost everyone has a "back door" into their equipment for just this kind of issue. We typically refer to that as an "out of band network", and could be via a separate internet connection that has nothing to do with the production networks, and/or an old school dialup modem. 2.) You can talk remote hands at a data center through attaching the console port of a router to a laptop and then remote into it. This is not hard.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Motronic posted:

I'm not saying that impossible, but it's implausible.

Not to get too far into the weeds, but there are two problems with this: 1.) Almost everyone has a "back door" into their equipment for just this kind of issue. We typically refer to that as an "out of band network", and could be via a separate internet connection that has nothing to do with the production networks, and/or an old school dialup modem. 2.) You can talk remote hands at a data center through attaching the console port of a router to a laptop and then remote into it. This is not hard.

Yes, but its also common for OOB to be ill maintained, often not architected well, and given only second thought. You figure FB would have a really good OOB management network, but here we are today.

We'll see for sure. All conjecture of course.

Mr. Fall Down Terror
Jan 24, 2018

by Fluffdaddy
gonna be some serious rethinking about remote work after this embarrassment lmao

Motronic
Nov 6, 2009

Mr. Fall Down Terror posted:

gonna be some serious rethinking about remote work after this embarrassment lmao

Their network engineers almost 100% definitely did not work in the actual data centers with this equipment even in the before times.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Mr. Fall Down Terror posted:

gonna be some serious rethinking about remote work after this embarrassment lmao

I don't think so. Its a tech issue, not a worker availability issue. Nobody being in the office would've made this better because it likely took out their office environment as well.

and yeah what Motronic said: Most of the network engineers I know have never set foot or even have access to the Datacenters their networking gear lives in.

Arsenic Lupin
Apr 12, 2012

This particularly rapid💨 unintelligible 😖patter💁 isn't generally heard🧏‍♂️, and if it is🤔, it doesn't matter💁.


CommieGIR posted:

BGP fills in that hole by propagating routes to things, so that once the DNS associates a domain with an IP, your router can find its way there through shared routes that BGP provides as a map to get back to them.

Reminds me a bit of USENET bang-style addresses, which were of course done by hand.

This graph is oddly soothing. https://stat.ripe.net/special/bgpla..._fetch.type=bgp

e: The action starts at 15:36.

Arsenic Lupin fucked around with this message at 19:59 on Oct 4, 2021

Arsenic Lupin
Apr 12, 2012

This particularly rapid💨 unintelligible 😖patter💁 isn't generally heard🧏‍♂️, and if it is🤔, it doesn't matter💁.


Motronic posted:

Their network engineers almost 100% definitely did not work in the actual data centers with this equipment even in the before times.

Are there cafeterias in the data centers? I rest my case.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Arsenic Lupin posted:

Are there cafeterias in the data centers? I rest my case.

Ours has a lunchroom and a couple office spaces, that's about it.

withak
Jan 15, 2003


Fun Shoe
I thought Facebook kept their server in the closet down the hall from the engineers.

Arsenic Lupin
Apr 12, 2012

This particularly rapid💨 unintelligible 😖patter💁 isn't generally heard🧏‍♂️, and if it is🤔, it doesn't matter💁.


Motronic posted:

Their network engineers almost 100% definitely did not work in the actual data centers with this equipment even in the before times.
Years ago I got to visit a data center at the Goog; it was a very high-security process to get in. Shortly after they shut down that data center, and the program. I suspect that to visit Facebook's, you either have to be a relevant hardware engineer, a data center employee, or a senior executive.

RFC2324
Jun 7, 2012

http 418

CommieGIR posted:

and yeah what Motronic said: Most of the network engineers I know have never set foot or even have access to the Datacenters their networking gear lives in.

I was always taught that allowing someone with physical access to the hardware to also have logical access is a security hole. A sysadmin/engineer has the passwords to decrypt the drive that a dc tech has the physical access to walk out with.

At my first job I had a dct approach me to decrypt a drive he had marked decommed but hadn't actually wiped and destroyed, but fortunately his lack of logical access meant it was also not a data drive, just an OS drive.

Motronic
Nov 6, 2009

Arsenic Lupin posted:

Years ago I got to visit a data center at the Goog; it was a very high-security process to get in. Shortly after they shut down that data center, and the program. I suspect that to visit Facebook's, you either have to be a relevant hardware engineer, a data center employee, or a senior executive.

I have never operated DCs on the scale of Facebook or Goog, but my DCs have always had:

- 100%, regularly tested out of band access. Nobody goes in the cage or touches equipment unless it's for repair or a physical change (add/remove circuits). This activity results in a physical audit to ensure it is absolutely accurately reflected in the rack elevation documentation.
- "Visitors" who are not people who are authorized to work on this equipment are reluctantly allowed to view the equipment from outside of the locked cage at the very most "because policies and procedures" (see above point)

It's completely unfathomable to me that anyone should ever need to physically touch equipment to correct a configuration or software issue. And completely inexcusable for something of the scale and (what should be) professionalism of an operation of that size.

But maybe I'm just somebody who learned how to do this properly. I've certainly dragged enough clients and employers into this basic mindset/setup over the years. This isn't even anything special. It's like, bare minimum "best practices" for a well functioning network.

RFC2324
Jun 7, 2012

http 418

Motronic posted:

I have never operated DCs on the scale of Facebook or Goog, but my DCs have always had:

- 100%, regularly tested out of band access. Nobody goes in the cage or touches equipment unless it's for repair or a physical change (add/remove circuits). This activity results in a physical audit to ensure it is absolutely accurately reflected in the rack elevation documentation.
- "Visitors" who are not people who are authorized to work on this equipment are reluctantly allowed to view the equipment from outside of the locked cage at the very most "because policies and procedures" (see above point)

It's completely unfathomable to me that anyone should ever need to physically touch equipment to correct a configuration or software issue. And completely inexcusable for something of the scale and (what should be) professionalism of an operation of that size.

But maybe I'm just somebody who learned how to do this properly. I've certainly dragged enough clients and employers into this basic mindset/setup over the years. This isn't even anything special. It's like, bare minimum "best practices" for a well functioning network.

this level of good isn't even standard in managed hosting, where being able to do exactly that is our bread and butter

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Motronic posted:

I have never operated DCs on the scale of Facebook or Goog, but my DCs have always had:

- 100%, regularly tested out of band access. Nobody goes in the cage or touches equipment unless it's for repair or a physical change (add/remove circuits). This activity results in a physical audit to ensure it is absolutely accurately reflected in the rack elevation documentation.
- "Visitors" who are not people who are authorized to work on this equipment are reluctantly allowed to view the equipment from outside of the locked cage at the very most "because policies and procedures" (see above point)

It's completely unfathomable to me that anyone should ever need to physically touch equipment to correct a configuration or software issue. And completely inexcusable for something of the scale and (what should be) professionalism of an operation of that size.

But maybe I'm just somebody who learned how to do this properly. I've certainly dragged enough clients and employers into this basic mindset/setup over the years. This isn't even anything special. It's like, bare minimum "best practices" for a well functioning network.

You are someone who learned to do it properly. Most people haven't. Its really common for architecture to be an absolutely hodge podge for even Fortune 50 companies.

Doggles
Apr 22, 2007

https://twitter.com/leahmcelrath/status/1445100877933694976

:vince:

Mister Facetious
Apr 21, 2007

I think I died and woke up in L.A.,
I don't know how I wound up in this place...

:canada:
https://twitter.com/mtsw/status/1445100477717180422?s=20

PhazonLink
Jul 17, 2010
So isnt AWS not only A massive backbone but THE massive backbone of a ton of stuff?

Wonder if they had any interesting calls asking what their defcon 1 plan is.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

PhazonLink posted:

So isnt AWS not only A massive backbone but THE massive backbone of a ton of stuff?

Wonder if they had any interesting calls asking what their defcon 1 plan is.

AWS has a lot of redundancy built in, but they make it clear: Its your job as the customer to ensure you have multi-region redundancy. Not theirs. That's how they ensure that an outage is your fault, not theirs.

Jasper Tin Neck
Nov 14, 2008


"Scientifically proven, rich and creamy."

Ars technica seems to have a pretty good rundown of what's going on.

https://arstechnica.com/information-technology/2021/10/facebook-instagram-whatsapp-and-oculus-are-down-heres-what-we-know/


Ars Technica posted:

According to u/ramenporn—who claims to be a Facebook employee and part of the recovery efforts—this is most likely a case of Facebook network engineers pushing a config change that inadvertently locked them out, meaning that the fix must come from data center technicians with local, physical access to the routers in question. The withdrawn routes do not appear to be the result of nor related to any malicious attack on Facebook's infrastructure.

Somebody moved fast and broke things.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Jasper Tin Neck posted:

Ars technica seems to have a pretty good rundown of what's going on.

https://arstechnica.com/information-technology/2021/10/facebook-instagram-whatsapp-and-oculus-are-down-heres-what-we-know/

Somebody moved fast and broke things.

Pretty much what we expected.

Mister Facetious
Apr 21, 2007

I think I died and woke up in L.A.,
I don't know how I wound up in this place...

:canada:

CommieGIR posted:

AWS has a lot of redundancy built in, but they make it clear: Its your job as the customer to ensure you have multi-region redundancy. Not theirs. That's how they ensure that an outage is your fault, not theirs.

So Amazon's defense is literally, "It's your fault for only using us, don't complain when we don't work! ¯\_(ツ)_/¯ "?

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Mister Facetious posted:

So Amazon's defense is literally, "It's your fault for only using us, don't complain when we don't work! ¯\_(ツ)_/¯ "?

No, not quite. Its your fault if you don't plan redundancy in your infrastructure, not theirs.

Amazon has multi-region redundancy, but you have to opt into it, and you pay for it.

Twerk from Home
Jan 17, 2009

This avatar brought to you by the 'save our dead gay forums' foundation.

Mister Facetious posted:

So Amazon's defense is literally, "It's your fault for only using us, don't complain when we don't work! ¯\_(ツ)_/¯ "?

Amazon actually offers better tools than basically anybody else to set up multi-region resiliency, but as mentioned you have to actually use the tools to build what you want.

Detective No. 27
Jun 7, 2006

Detective No. 27
Jun 7, 2006

CommieGIR posted:

Our tech dream is someone else's tech nightmare. And Facebook sucks and all, but there are a lot of people that depend on FB Messenger for being able to reach people. As gleeful as this makes us, its someone's nightmare.

Yeah, take it with a big grain of salt, but not the first time I've had stuff like this happen to clients, FB isn't that special in that regard if its true.

Still a Tech Nightmare, and must be an opportunity for scammers. A stranger just randomly texted my work phone saying "Facebook messenger is down." (We don't use Facebook messenger)

papasyhotcakes
Oct 18, 2008
FB appears to be up again. WA and IG still down.

withak
Jan 15, 2003


Fun Shoe
We regret to inform you that Facebook is back.

Arsenic Lupin
Apr 12, 2012

This particularly rapid💨 unintelligible 😖patter💁 isn't generally heard🧏‍♂️, and if it is🤔, it doesn't matter💁.


Motronic posted:

I have never operated DCs on the scale of Facebook or Goog, but my DCs have always had:

- 100%, regularly tested out of band access. Nobody goes in the cage or touches equipment unless it's for repair or a physical change (add/remove circuits). This activity results in a physical audit to ensure it is absolutely accurately reflected in the rack elevation documentation.

I'm curious about this. What happens when you move early-release hardware from the lab to the data center for further testing? Does this not ever happen?

Adbot
ADBOT LOVES YOU

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

Arsenic Lupin posted:

I'm curious about this. What happens when you move early-release hardware from the lab to the data center for further testing? Does this not ever happen?

Most servers are now either Virtualized (i.e. they exist as a virtual machine, a server that exists as software that has hardware access within another server), Containers (a stripped down Operating System that exists as a process within a server), or serverless (apps that exist only temporarily to process data and then stop existing until spawned again). Nobody is really physically hosting on metal anymore except for highly specialized apps that need bare metal due to performance and mainframes (which are even now doing virtualization).

So no, nobody is physically moving hardware. If someone needs to install a new server/host or new networking gear, someone inside the DC whose job it is to install that receives the gear at the shipping location, unpacks it, racks it, configures it according to the Engineers instructions, and then its hands off other than configuration changes driven remotely by engineers/developers.

Don't want to get too far into it, as there's a lot, but most in house developed software goes through (or is supposed to go through) testing like Quality Assurance and User Acceptance testing before being rolled to production. However, for networking gear, a lot of times there's no such thing, or if they are lucky they'll have a small lab they test config changes in. But for stuff like BGP and DNS changes, there's not a great way to test those changes because by their very nature they involve internet facing changes that are more difficult to replicate. You just have to know what you are doing, and double/triple check your changes and configs before they get deployed.

CommieGIR fucked around with this message at 23:55 on Oct 4, 2021

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply