Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Unexpected Raw Anime
Oct 9, 2012

Wibla posted:

So mystery partially solved: A switch is starting to fail, rebooted and somehow triggered an STP root change that made a lot of noise :argh:

Several L3 routers along the way are misconfigured and I guess I know what I'm doing next week.

gently caress spanning tree and gently caress even harder everyone who turns it on without bothering to configure root bridge priorities


as an aside, you should never be using STP in an environment where you dont have intentional redundant paths. It causes more problems than it fixes.

Adbot
ADBOT LOVES YOU

Thanks Ants
May 21, 2004

#essereFerrari


Don't you need some sort of spanning tree protocol turned on to do stuff like BPDU guard?

Filthy Lucre
Feb 27, 2006
You need STP for the unintentional redundant paths.

Wibla
Feb 16, 2011

Unexpected Raw Anime posted:

gently caress spanning tree and gently caress even harder everyone who turns it on without bothering to configure root bridge priorities


as an aside, you should never be using STP in an environment where you dont have intentional redundant paths. It causes more problems than it fixes.

This part of the network has been 'less extensively tested' after a hasty go-live a few years ago, so we have a lot of smaller stuff to fix yet. Some of this poo poo is surfacing now because the oldest switches are 10-15 years old and have started to fail :downs:

We're doing a bottom-up redesign and consolidation of our industrial networks, and this network is one of them, so we'll see how much effort we put in beyond me learning poo poo on the job while causing (intermittent) planned downtime.

SlowBloke
Aug 14, 2017

Thanks Ants posted:

It's meant to sort that out automatically

iOS/iPadOS 15.6+ only, older build will stay on basic auth without user intervention.

devmd01
Mar 7, 2006

Elektronik
Supersonik
Blocked basic auth with conditional access over a year ago when they gave the No Really We Mean It This Time deadline, and worked through the handful of exceptions that the logging called out. There is no excuse for any org to have missed the deadline, that’s just rank incompetence.

ConfusedUs
Feb 24, 2004

Bees?
You want fucking bees?
Here you go!
ROLL INITIATIVE!!





I'm trying to do a prioritization exercise. We are trying to go through a list of requests and determine which ones are most important. We defined our goal as X. We defined "Impact" as how the change is likely to improve X. We defined "Risk" as how likely the change is to cause a new problem that reduces Impact. Our goal is to identify those with high Impact and low Risk. That gives us our prioritized list of changes.

I spent all day yesterday going through our (incredibly lovely) ticketing system to get all the enhancement requests that could help us reach X. Now the very same people who agreed to our definitions are not happy with the definitions, and want to do X and Y both.

Usually I don't mind this so much, it's all part of the game, but these days the flip-floppers are coming from within my own team. It's starting to wear me down.

Sickening
Jul 16, 2007

Black summer was the best summer.

devmd01 posted:

Blocked basic auth with conditional access over a year ago when they gave the No Really We Mean It This Time deadline, and worked through the handful of exceptions that the logging called out. There is no excuse for any org to have missed the deadline, that’s just rank incompetence.

Funny enough certain azure 3rd party enterprise applications are going to fail and the error isn't going to be very clear. For example , a helpdesk person was doing some apple integrations with their macbook management application and auth was failing over and over again. The details of the auth only said interrupted. No other alerts or error codes existed. I , on a total loving hunch, looked the log and saw "single factor auth". Knew right away after that.

Azure AD sign on logs should be registering new errors about "unsupported authentication" or some poo poo, not just "interrupted". Because in the case of orgs who have been blocking basic auth through conditional access policies, tracking down this issue was clear. This? not so much.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
Nothing better than half our GCP artifactory images failing to pull on a Friday. All our repos are federated so we can at least tell folks to pull cross cloud but there’s no rhyme or reason I can tell why some of these images pull successfully and some don’t.

Adding insult to injury the error message is a completely incorrect false message about untrusted certificates which I just know is going to waste a few hours with support :negative:

joebuddah
Jan 30, 2005
Can some please explain why when someone accidentally clicked reply all, there is always that one person who has to reply with something like

"Think about how much time has been wasted, replying to these emails"

i am a moron
Nov 12, 2020

"I think if there’s one thing we can all agree on it’s that Penn State and Michigan both suck and are garbage and it’s hilarious Michigan fans are freaking out thinking this is their natty window when they can’t even beat a B12 team in the playoffs lmao"
People are pedantic know it all fucks sometimes

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady

The Iron Rose posted:

Nothing better than half our GCP artifactory images failing to pull on a Friday. All our repos are federated so we can at least tell folks to pull cross cloud but there’s no rhyme or reason I can tell why some of these images pull successfully and some don’t.

Adding insult to injury the error message is a completely incorrect false message about untrusted certificates which I just know is going to waste a few hours with support :negative:
Have the failing ones somehow managed to end up with a broken chain of trust?

Sickening
Jul 16, 2007

Black summer was the best summer.

joebuddah posted:

Can some please explain why when someone accidentally clicked reply all, there is always that one person who has to reply with something like

"Think about how much time has been wasted, replying to these emails"

I have only had power over a person like this once and it was one of the more satisfying days I eve had. They seemed crushed by the fact that I dressed them down for being an rear end in a top hat.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Arquinsiel posted:

Have the failing ones somehow managed to end up with a broken chain of trust?

I’d be surprised since there are good and bad images in the same repo. Some I can pull by SHA, some by tag, others no. The little docker icon identifying an image as an image is missing on all the bad ones so I’m more inclined to think something fucky with replication or the cache.

Thankfully this is exactly why we have multiple repos we can fail back to and pull from.
We ran into this GitHub issue a few weeks back so I think it’s the same faulty error message at hand here: https://github.com/containerd/containerd/issues/6097

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady
That's a real interesting and fucky error.

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Unexpected Raw Anime posted:

as an aside, you should never be using STP in an environment where you dont have intentional redundant paths. It causes more problems than it fixes.

Is it weird that in thirty years of networking I have never turned on STP?

Actuarial Fables
Jul 29, 2014

Taco Defender

Agrikk posted:

Is it weird that in thirty years of networking I have never turned on STP?

I haven't either, but that's because it's on by default for all the switches I use.

Potato Salad
Oct 23, 2014

nobody cares


Github actions is making me miss having gitlab, gently caress

e: snip

Potato Salad fucked around with this message at 03:06 on Oct 31, 2022

Sickening
Jul 16, 2007

Black summer was the best summer.

Potato Salad posted:

Github actions is making me miss having gitlab, gently caress

e: snip

gitlab is pretty great.

PremiumSupport
Aug 17, 2015

Actuarial Fables posted:

I haven't either, but that's because it's on by default for all the switches I use.

I actually had to turn it off on a couple new replacement switches I obtained in the last year. It was causing a rather lengthy delay in users getting an IP address when first connecting to the network.

Filthy Lucre
Feb 27, 2006

PremiumSupport posted:

I actually had to turn it off on a couple new replacement switches I obtained in the last year. It was causing a rather lengthy delay in users getting an IP address when first connecting to the network.

Enable portfast on the user facing ports instead. It allows the switch to skip the Listening/Learning steps and go straight to Forwarding.

PremiumSupport
Aug 17, 2015

Filthy Lucre posted:

Enable portfast on the user facing ports instead. It allows the switch to skip the Listening/Learning steps and go straight to Forwarding.

It was quicker and easier to turn off STP. We're not a complicated network and nobody but me dares touch a network cable anyway so STP is not needed.

Edit: they're also all user facing ports except the single one being used for uplink.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Arquinsiel posted:

That's a real interesting and fucky error.

The issue ended up being missing manifest.json files, which was caused because we had to delete the underlying storage for one of our clouds awhile back before we realized we could orphan services using PVCs. Federated repos only update new files by default though, so images that hadn’t been updated in time didn’t get replacement manifests but did get the image itself apparently. Regardless, running a full sync fixed it nicely.

bull3964
Nov 18, 2000

DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.


PremiumSupport posted:

It was quicker and easier to turn off STP. We're not a complicated network and nobody but me dares touch a network cable anyway so STP is not needed.

Edit: they're also all user facing ports except the single one being used for uplink.

You say this...

In my younger years I worked in an office with unmanaged (and thus, no STP) switches. Some end user managed to accidently bridge their wireless adapter with the wired one on their notebook and subsequently brought down the entire network when they docked their computer.

Internet Explorer
Jun 1, 2005





Yeah, not having STP on at endpoint switches is asking for pain, IMO.

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady

The Iron Rose posted:

The issue ended up being missing manifest.json files, which was caused because we had to delete the underlying storage for one of our clouds awhile back before we realized we could orphan services using PVCs. Federated repos only update new files by default though, so images that hadn’t been updated in time didn’t get replacement manifests but did get the image itself apparently. Regardless, running a full sync fixed it nicely.
So the manifest.json files were telling things where to check the certs and lacking updates they were pointing to certs that didn't exist?

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Arquinsiel posted:

So the manifest.json files were telling things where to check the certs and lacking updates they were pointing to certs that didn't exist?

Not quite. This is a docker manifest, which has information about layers, size, and the digest. It can also give information about the OS and CPU arch an image was built for.

This file was missing from the repositories we were trying to download from, so containerd failed to get the layers and download the image.

The error returned from artifactory to containerd *should* have been “invalid image”, and if we were using docker as our container runtime it would have been the error we got. However, due to the aforementioned GitHub issue I linked, artifactory instead told containerd it was a cert issue, even though it was nothing of the sort.

The Iron Rose fucked around with this message at 20:48 on Nov 1, 2022

KillHour
Oct 28, 2007


Internet Explorer posted:

Yeah, not having STP on at endpoint switches is asking for pain, IMO.

100%. Let's set a scene. A computer and IP phone was just removed from a cube. The tech left the cat5 cable that was connected to the phone dangling from its port. The other end of the cable just happens to be laying next to the port that the computer was plugged into. How long before a well-intentioned yet ignorant worker sees the disconnected cable conspicuously next to a hole it clearly fits in, and decides that must be the reason the printer on the other side of the cubicle wall "isn't working"?

Trick question - it already happened as you were reading this and now your day is hosed

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat

KillHour posted:

100%. Let's set a scene. A computer and IP phone was just removed from a cube. The tech left the cat5 cable that was connected to the phone dangling from its port. The other end of the cable just happens to be laying next to the port that the computer was plugged into. How long before a well-intentioned yet ignorant worker sees the disconnected cable conspicuously next to a hole it clearly fits in, and decides that must be the reason the printer on the other side of the cubicle wall "isn't working"?

Trick question - it already happened as you were reading this and now your day is hosed

I'd have to take off both my shoes to use my fingers and toes to count how many times users looped the network using the PC port on a VOIP device.

Internet Explorer
Jun 1, 2005





Yuuuuup. It's also one of those things where you get to run around like an idiot trying to figure out what's going on. Because if you don't have STP enabled on end-user facing ports, you definitely don't have things set up to easily be able to track down looping.

bull3964
Nov 18, 2000

DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.


I know in my case I had to sniff traffic to find the offending mac address, track down the vendor and only blind luck made it so there were only a handful of those in the office to physically look at. Took me about 45 minutes from onset of symptoms to tracking down what happened and with whom. Not too bad, but it could have been much worse.

Polio Vax Scene
Apr 5, 2009



somebody fidgeting in a conference room brought our entire office's network down because they plugged an ethernet cable's two ends into the same port. it really seems like something that should be more idiot proof.

Internet Explorer
Jun 1, 2005





It is, if you have STP on and configured properly.

Rexxed
May 1, 2010

Dis is amazing!
I gotta try dis!

Internet Explorer posted:

It is, if you have STP on and configured properly.

But they said no more grunge rock in the conference rooms.

Wibla
Feb 16, 2011

An ip phone nearly took down one of our critical networks a few years ago. Not a good time :v:

Arquinsiel
Jun 1, 2006

"There is no such thing as society. There are individual men and women, and there are families. And no government can do anything except through people, and people must look to themselves first."

God Bless Margaret Thatcher
God Bless England
RIP My Iron Lady

Polio Vax Scene posted:

somebody fidgeting in a conference room brought our entire office's network down because they plugged an ethernet cable's two ends into the same port. it really seems like something that should be more idiot proof.
Uhm... they got both ends of the cable into the port at once? How?

Sywert of Thieves
Nov 7, 2005

The pirate code is really more of a guideline, than actual rules.

That reminds me of back when I did IT support for a university campus of ~500 students. It was pretty calm usually, except when someone plugged in their router backwards every now and then, and inadvertently created a rogue DHCP server that took the network down. Until we scrambled to find the offending outlet and shut it down, then map it back to a dorm number, and have a stern talking-to with the resident.

The only other thing I really did was forward copyright infringement letters to idiots who used public Torrents, instead of the underground student DC++ warez network.

Proteus Jones
Feb 28, 2013



Arquinsiel posted:

Uhm... they got both ends of the cable into the port at once? How?

I'm figuring it's more along the lines of a plate with two physical ports labeled something like 43A and 43B.

Internet Explorer
Jun 1, 2005





Proteus Jones posted:

I'm figuring it's more along the lines of a plate with two physical ports labeled something like 43A and 43B.

Yeah, this was my assumption as well.

Adbot
ADBOT LOVES YOU

CitizenKain
May 27, 2001

That was Gary Cooper, asshole.

Nap Ghost
Someone doing that with a network cable is how our department took over switching. Person was at the staging line BSing with a tech, saw a loose cable, and plugged it into a jack and then headed out the door to lunch. I was called to see what was going on, I could get to our router, but nothing past it, and I could see the uplink port was going wild. I couldn't get an answer out of anyone if something had changed there. Thankfully I finally go to someone who recalled that cable being plugged in. They unplugged it, and things calmed down. This took our helpdesk down for about 30 minutes.
Later had it happen to a switch at a remote site, I caused it. This location still had the old HP switches and I was onsite to turn up gear and a person was helping me reconnect all the cables. It was a giant mess on the floor, and we didn't notice we plugged in both ends until he noticed a local server there dropped. Thankfully, no one was on that switch yet.

CitizenKain fucked around with this message at 02:09 on Nov 2, 2022

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply