Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Thanks Ants
May 21, 2004

#essereFerrari


Lmao how does everything you do rely on one core switch to be configured right

Adbot
ADBOT LOVES YOU

H110Hawk
Dec 28, 2006

Thanks Ants posted:

Lmao how does everything you do rely on one core switch to be configured right

Well you see vss and

CLAM DOWN
Feb 13, 2007
Probation
Can't post for 6 hours!
Sounds like when we used to run a Nortel core lmao

CloFan
Nov 6, 2004

Can you imagine being that network team? I mean, that's happened in our environment once or twice but it was due to single-point-of-failure and a $peanuts budget not because we didn't backup the configs before a major change, lol. I bet they had to dig the old core out of the dumpster at 2am

GreenNight
Feb 19, 2006
Turning the light on the darkest places, you and I know we got to face this now. We got to face this now.

We replaced our core switch this year and I tell you, we had all the configs, logins, whatever we needed saved offline because we knew as soon as we turned it off, we're not accessing any network resources. We're a small company. It's not that loving hard to CYA.

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

GreenNight posted:

We replaced our core switch this year and I tell you, we had all the configs, logins, whatever we needed saved offline because we knew as soon as we turned it off, we're not accessing any network resources. We're a small company. It's not that loving hard to CYA.

No poo poo. If it's me, I get them up and running side by side and move services over one by one. More work that weekend, but a whole hell of a lot less stressful when something goes wrong.

GreenNight
Feb 19, 2006
Turning the light on the darkest places, you and I know we got to face this now. We got to face this now.

Yeah we had all the ports configured on the new core before we replaced it. We mapped them all out from the patch panel. We did the legwork so it went smooth.

Methanar
Sep 26, 2013

by the sex ghost
Things I never want to do again

  • maintenance/changes on the VPN and management network. Remotely.

Proteus Jones
Feb 28, 2013



adorai posted:

No poo poo. If it's me, I get them up and running side by side and move services over one by one. More work that weekend, but a whole hell of a lot less stressful when something goes wrong.

DING DING DING

This is the loving answer. It also makes rolling back a snap.

Antioch
Apr 18, 2003
It's now been almost 18 hours. There's webmail access now, still nothing customer facing. I haven't heard any updates from the network team in a while, but I'm getting into the bottom third of a bottle of Chianti so I'm not worried.

I've got about 600 alert emails from my F5 pools going down harder than a senior on prom night.

devmd01 posted:

Like...how does that even happen. For something that critical you get an outside vendor to assist and make sure you've crossed i's and t's on your change plan, so you can shift some of the blame if it does go south. Do keep us informed. :allears:


We had one. I don't know who, but I really hope this is the last time they ever hear from us.

MC Fruit Stripe posted:

Putting my manager hat on for a moment, "could you guys hurry up and resolve this so I can fire all of you?"

God I hope so, but I'm not holding my breath.

CloFan posted:

Can you imagine being that network team? I mean, that's happened in our environment once or twice but it was due to single-point-of-failure and a $peanuts budget not because we didn't backup the configs before a major change, lol. I bet they had to dig the old core out of the dumpster at 2am

Here's the cherry. They called off the new switch around 2. Old one is "corrupted" according to the network manager. So they're going ahead with the new one, and there is no fallback.

When I made my first post, I was laughing because gently caress our network team, they're dumb and this proves it.
Now I'm legitimately worried about going to work tomorrow, because there may not BE a network still at 8am when I go in.

Thanks Ants
May 21, 2004

#essereFerrari


:psyduck: I am utterly confused how it's taking that long to rebuild a config. Unless literally nothing is documented and everybody is having to figure out what various interfaces should be addressed as, and what rules they had in place to make things work.

Sickening
Jul 16, 2007

Black summer was the best summer.

Antioch posted:

It's now been almost 18 hours. There's webmail access now, still nothing customer facing. I haven't heard any updates from the network team in a while, but I'm getting into the bottom third of a bottle of Chianti so I'm not worried.

I've got about 600 alert emails from my F5 pools going down harder than a senior on prom night.



We had one. I don't know who, but I really hope this is the last time they ever hear from us.


God I hope so, but I'm not holding my breath.


Here's the cherry. They called off the new switch around 2. Old one is "corrupted" according to the network manager. So they're going ahead with the new one, and there is no fallback.

When I made my first post, I was laughing because gently caress our network team, they're dumb and this proves it.
Now I'm legitimately worried about going to work tomorrow, because there may not BE a network still at 8am when I go in.

If you guys were in the cloud none of this would happened.

anthonypants
May 6, 2007

by Nyc_Tattoo
Dinosaur Gum

Thanks Ants posted:

:psyduck: I am utterly confused how it's taking that long to rebuild a config. Unless literally nothing is documented and everybody is having to figure out what various interfaces should be addressed as, and what rules they had in place to make things work.
The only way it takes any substantial amount of time is if you have literally zero documentation.

Virigoth
Apr 28, 2009

Corona rules everything around me
C.R.E.A.M. get the virus
In the ICU y'all......



Sickening posted:

If you guys were in the cloud none of this would happened.

Yeah just Business-Critical System Down page Agrikk and make it his problem already. They love when you wake the whole team and mgmt up with those pages.

GreenNight
Feb 19, 2006
Turning the light on the darkest places, you and I know we got to face this now. We got to face this now.

It would be a loving nightmare to rewrite all our port configs that connect to each port on each ESX host from scratch.

Dr. Arbitrary
Mar 15, 2006

Bleak Gremlin

Antioch posted:

Here's the cherry. They called off the new switch around 2. Old one is "corrupted" according to the network manager.

It probably got hit with an etherblast. No way to plan for one of those and they got hit during an upgrade.

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

anthonypants posted:

The only way it takes any substantial amount of time is if you have literally zero documentation.

Which is also, coincidentally, how you gently caress up a migration due to 'unforeseen' scenarios and attempt a roll back at 2 am

Kashuno
Oct 9, 2012

Where the hell is my SWORD?
Grimey Drawer
Every time I question if I am competent at my job or if I’m just faking it until I make it, this thread reassures me that no matter how bad I think I am there is always significantly worse

Proteus Jones
Feb 28, 2013



Dr. Arbitrary posted:

It probably got hit with an etherblast. No way to plan for one of those and they got hit during an upgrade.

And they actually had backups, but they were on Buffalos and some fool filled it to 96% capacity.

adorai
Nov 2, 2002

10/27/04 Never forget
Grimey Drawer

Proteus Jones posted:

And they actually had backups, but they were on Buffalos and some fool filled it to 96% capacity.
It's not a backup if it's never been tested.

Proteus Jones
Feb 28, 2013



adorai posted:

It's not a backup if it's never been tested.

C'mon. Don't ruin the larches call back joke that was started with "etherblast"

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Virigoth posted:

Yeah just Business-Critical System Down page Agrikk and make it his problem already. They love when you wake the whole team and mgmt up with those pages.

Your infra is down during customer originated maintenance? drat, man, I'm really sorry that happened. I mean, wow. I am really sorry. I wish there was something I could do. Really. The shared responsibility model that we talk about literally every day means that I can do best effort to help you, but ultimately the onus is on you guys not to have gone full-retard on this.


In all seriousness: By all means open up that critical sev 5 case. The ten thousand cloud support engineers we have staffed around the world means that you ain't waking up anyone, son. They're already up and kicking rear end.* You'd be surprised the kinds of trouble customers get themselves into (no you wouldn't) and then expect/hope that we can bail them out.






* in most cases

Agrikk fucked around with this message at 04:07 on Sep 25, 2017

Antioch
Apr 18, 2003
Entering 21 hours. Internal services mostly restored. Webmail is up. Internet is slow, no idea what's going on there. The decision has been made to leave customer facing things down until all internal testing is complete. Social Media intern on call this weekend nowhere to be found, poor thing. Presumed quit.

Rumors circulating in back channels of "strike team" of VPs, SVPs, and Directors who have formed a complaint cabal. Expecting resignation of many Network and management staff in AM.

I have a feeling my network dependent project will not be completing this week.

I've run out of wine, on to Vodka - signed the The Mountain himself!

Agrikk
Oct 17, 2003

Take care with that! We have not fully ascertained its function, and the ticking is accelerating.

Antioch posted:

until all internal testing is complete

Seeing as how they did no testing the first time, how long can "testing" really take?

MC Fruit Stripe
Nov 26, 2002

around and around we go
Also, what does a 'strike team' of directors, VPs and SVPs do? How many people do you really need on a technical bridge going "guys where are we at on this?" every 5 minutes?

Sniep
Mar 28, 2004

All I needed was that fatty blunt...



King of Breakfast

MC Fruit Stripe posted:

Also, what does a 'strike team' of directors, VPs and SVPs do? How many people do you really need on a technical bridge going "guys where are we at on this?" every 5 minutes?

I presume they're "just checking in" - it'd be irresponsible for them not to - you know - they have execs

Antioch
Apr 18, 2003
3....2....1....

Happy 24 hour outage!

Woo!

Corsair Pool Boy
Dec 17, 2004
College Slice

Thanks Ants posted:

Lmao how does everything you do rely on one core switch to be configured right

This was my first thought. We have two, and the last round of upgrades happened one at a time, a couple weeks apart, just to make sure the new one was good before touching the other one.

Keep us posted! We all want to know the fallout over the next few weeks!

Thanks Ants
May 21, 2004

#essereFerrari


Antioch posted:

3....2....1....

Happy 24 hour outage!

Woo!



:toot:

Che Delilas
Nov 23, 2009
FREE TIBET WEED

MC Fruit Stripe posted:

Also, what does a 'strike team' of directors, VPs and SVPs do? How many people do you really need on a technical bridge going "guys where are we at on this?" every 5 minutes?

This is the kind of event that makes or breaks someone in a leadership position. By that I mean, either they talk enough and in a commanding enough voice that they can take credit for "leading the team through the crisis" (a.k.a. make noise until the people doing the actual work finish that work, which they would have done anyway and probably faster without all the buzzing in their ears), or they don't and someone blames it on them and they get fired or shamed. Every manager tangentially related to the problem is going to recognize the scenario and want to participate if they don't want to be sacrificed.

I have yet to personally see an interdepartmental pissing match or stubborn individual literally prevent a crisis of this magnitude from being resolved, to the point that a heroic c-level boldly steps in and rights the ship, hollywood-style. I'm sure it happens occasionally.

Vargatron
Apr 19, 2008

MRAZZLE DAZZLE


Antioch posted:

3....2....1....

Happy 24 hour outage!

Woo!



Please keep posting updates, sir.

Also, Social Media Intern saw the ship was taking on water. You know what they say about rats leaving a sinking ship...

Judge Schnoopy
Nov 2, 2005

dont even TRY it, pal

Vargatron posted:

Please keep posting updates, sir.

Also, Social Media Intern saw the ship was taking on water. You know what they say about rats leaving a sinking ship...

Intern is meditating atop a mountain contemplating a way to spin this to customers. They should be back within 3 years.

Vargatron
Apr 19, 2008

MRAZZLE DAZZLE


Media Intern is saying to him/herself "I only get paid 10 bucks an hour for this poo poo, gently caress it".

Sickening
Jul 16, 2007

Black summer was the best summer.

Antioch posted:

Entering 21 hours. Internal services mostly restored. Webmail is up. Internet is slow, no idea what's going on there. The decision has been made to leave customer facing things down until all internal testing is complete. Social Media intern on call this weekend nowhere to be found, poor thing. Presumed quit.

Why is a social media intern doing an on call shift? Am I the only one who thinks this is absurd?

Thanks Ants
May 21, 2004

#essereFerrari


Maybe the social media intern doubles as the abuse sponge for all the customers who can't use the service? Not that having that as the explanation is any less wrong.

H110Hawk
Dec 28, 2006

Sickening posted:

Why is a social media intern doing an on call shift? Am I the only one who thinks this is absurd?

Well at this point they're making $20/hour in double time. That would be enough for me.

Beef Hardcheese
Jan 21, 2003

HOW ABOUT I LASH YOUR SHIT


I'm a bottom-tier IT guy for the place that I work, and sometime in Q2 of next year we're going to be moving locations for a "temporary" 2-3 year period. We're a small department of about a dozen people under a larger organization that's physically scattered across several locations. There are numerous other higher-up IT sysadmins and network people that are going to be working with this, but I'm going to be the one who has to worry about making sure we have enough power strips, plugging in and setting up the workstations, basic on-site printer unjamming and error message Googling stuff. I was wondering if anyone had any advice or insights on this sort of thing, stuff that might be easily overlooked or random "gotcha" stuff. (Apologies in advance if this isn't the best thread for this kind of question, I figured it would be either here or "a ticket came in".)

Thanks Ants
May 21, 2004

#essereFerrari


The thing that I've seen forgotten a ton is when companies move locations and their public IP address changes as they've had to switch providers, anything that uses IP address whitelisting stops working. So check things like printers that might relay through Exchange Online, any 3rd party services that are firewalled to only accept connections from your office etc.

IP whitelisting is a pretty poo poo way of trying to secure something anyway, but that doesn't mean it's not going to cause you problems.

An impending office move is a good time to get stuff like email and telephones out of your office and into the realm of Somebody Else's Problem[TM], it means people have all the services still up during the move so any snagging that happens isn't as disruptive as it would be otherwise.

Sickening
Jul 16, 2007

Black summer was the best summer.

H110Hawk posted:

Well at this point they're making $20/hour in double time. That would be enough for me.

I think the issue of relying on an intern for a critical job roll is both idiotic and unethical.

Adbot
ADBOT LOVES YOU

Beef Hardcheese
Jan 21, 2003

HOW ABOUT I LASH YOUR SHIT


We're working with the ISP and phone company to ensure that we keep the same phone number (we're kind of a call center setup), and since we use VOIP for everything I'm thinking that our IP addresses and everything else should carry over? I'll keep the whitelisting stuff in mind, though. I've been here for a year, but there (of course) wasn't any documentation on the current setup, exacerbated by the fact that there wasn't any on-site IT before me. Any time something needed to be done, someone from the main office had to come down (which could be an hour each way depending on traffic), and I've done my best to avoid touching anything that isn't broken. Our email migrated to Gmail last year so that's already out of our hands.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply