Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Nuclearmonkee
Jun 10, 2009


Collateral Damage posted:

What are the enterprise ready alternatives to ESXi anyway?

We're eyeballing Google Compute Engine as we already have a decent amount of Cloud Run / GKE apps and what we have on vmware is mostly third party Windows software that can't be easily containerized, but GCE always felt like "using the cloud wrong".

It kinda is but there is a lot of trash legacy stuff that can’t be cleanly containerized without taking the entire windows VM and shoving it in a container. GCE will work.

For on prem, can run legacy VMs in k8s with kubevirt too. If you have a requirement of having 24/7 vendor support in an org that fears being more self supporting openshift, portworx, and some flavor of micro segmentation like Cisco ACI (ew) will work to provide a highly available, backed up and secure environment.

We’re going to probably pilot Rancher/RKE2 and Harvester with Arista+Palo Alto MSS in front of it. I figure if the reason for getting away from VMware is to avoid paying their ransom, going to IBM with OpenShift is a questionable call.

Adbot
ADBOT LOVES YOU

Nuclearmonkee
Jun 10, 2009


tokin opposition posted:

I would jump at a water filtration, hydroponics, or power plant it job, although I probably don't have the clearance (and Post too much) to get the last one

Power plant IT is kinda terrible. It can be interesting and all but it's very intense and rule bound while also being attached to old lovely hardware.

For clarity, all of the power plant IT I've had to do was around co-generation power facilities attached to manufacturing complexes. Pure "we're a power company" not "we make X and also electricity" seems less stressful, but more burdened by bureaucracy. Sometimes I do wish we were bound by some of those rules, but cogens have different rule sets so we're allowed to do dumber things.

Nuclearmonkee
Jun 10, 2009


Antigravitas posted:

As far as I can tell, all the cool stuff in power is in renewable automation and control. So less big plants, and more automation in virtual plants. Interesting, but the job ads I get in that space look intimidating as hell.

That sounds much cooler than "hey our DS0 that does metering from the substation is down again. Plz fix. No we will not change to a fiber service that is scheduled for 2026."

Nuclearmonkee
Jun 10, 2009


Prescription Combs posted:

They're weird. I'd rather work on Checkpoint or Palo Altos

E: Also Juniper

There are some cool things you can do with Fortinets (binding policy to BGP ASN comes immediately to mind) and other neat stuff that I have missed, but yeah overall I like our Palos better.

Nuclearmonkee
Jun 10, 2009


Haha guess who Broadcom is selling VMWare EUC to.

https://blogs.vmware.com/euc/2024/02/broadcoms-euc-division-embraces-its-future-as-a-standalone-business.html

If you guessed the guys who invented the leveraged buyout with a long trail of destroyed companies in their wake, you'd be right! :suicide:

Nuclearmonkee
Jun 10, 2009


Thanks Ants posted:

It really feels like there's a moment for Microsoft and Citrix to step up and bother to compete but yeah, can't see it.

Citrix already got sold to PE in 2022 and has seen nasty increases. MS really wants us to try out Azure Stack HCI + AVD and we're probably going to go pilot it. Weeeee

It's not a great time to have a need for on prem VDI with vGPU.

Nuclearmonkee
Jun 10, 2009


Potato Salad posted:

it's already been spun off internally to stand on its own two feet, which trivially it cannot do

CB will be written off as a massive loss recovery tax benefit at the sale price, so it's not even like our society is going to be compensated for the murder

If they don't sell it, I assume they'll strip it to the bone then fold it into Broadcom's amazing security product everyone likes, Symantec.

Nuclearmonkee
Jun 10, 2009


tokin opposition posted:

Presented to the whole org again, and you know what I really love? Last second schedule changes because a senior manager felt like it that means I presented a half hour before I was planned to. What's the point of an agenda if nobody follows it >.<

Look at mr fancy over here, with an "agenda"

Nuclearmonkee
Jun 10, 2009


Internet Explorer posted:

If it makes you feel better, I am too. I wish I was better at not being burnt out, but I'm pretty sure I've somehow been burnt out for a decade and yet somehow this week feels particularly bad. It used to be that I could just focus on work and that was it's own set of burn out, but these days I just have such a long list of boring poo poo to do at work and such a long list of boring poo poo to do at home and I just want to say gently caress it all, smoke weed, and play video games.

I only get the burnout feeling when I’m stuck in idiot politics/meetings land or am dealing with the results of other people’s terrible decisions. I still enjoy solving complex technology puzzles and fitting things together so they do cool stuff. Still feel a sense of accomplishment every time we start a new factory and it starts spitting 2x4s out the end to go make houses. That part is still fun to me 20 years in somehow, even with all the bullshit all around it.

Nuclearmonkee
Jun 10, 2009


Silly Newbie posted:

If you actually Know How Networking Works you've got a solid gold skill set.

This is truth if you work in infrastructure. Network and security knowledge are universally applicable in enterprise if you’re at a senior level. Every SDN micro segmentation solution and cloud tenant setup requires you have at least CCNA level knowledge of how poo poo works to avoid royally screwing it up. If you are actually in the guts of the stuff, more advanced knowledge of other networky stuff like routing, (every loving thing is peering to BGP these days) is invaluable, particularly when it breaks or some piece isn’t working right.

Nuclearmonkee
Jun 10, 2009


The Iron Rose posted:

Ehhhh honestly 80% of it is literally just knowing how network routes work. the cloud providers abstract almost everything else interesting away from you. Even the BGP peering for site to site stuff isn’t that complex, if only because all the hyperscalers have very clear, very simple setup steps, with very few knobs to turn.

weirdly for such a simple concept (have a cidr and a next hop), routing somehow turns off so many people’s brains.

The other 20% of it is peering VPCs or using private endpoints/service lattice/private services. Azure easily has the best UX for that - GCP’s in particular is bizarrely confusing.

When you’ve got some on prem private cloud and multi-cloud set ups, it can get pretty hairy trying to tie the networking together. though to be fair, the worst of that is all of the self managed stuff since you have to get the underlay and overlay networking going yourself.

If you’re only managing cloud infrastructure then that’s Microsoft or Google’s or Amazon‘s problem.

Nuclearmonkee
Jun 10, 2009


Internet Explorer posted:

I got a job once because I said 9000 MTU was for chumps.

Why is it for chumps? I think I'm a chump :ohdear:

Nuclearmonkee
Jun 10, 2009


Potato Salad posted:

I made that same argument about 7 years ago where I work, after wasting a massive amount of salary time on a problem caused by someone's completely-understandable whoopsie oopsie on mtu config

Internally we just have MTU 9k set inside the datacenter, with defaults on the WAN and campus. Since it's all controlled via templating I don't even think about it. However, if you don't have good control, I could see where that complexity would get away from you.

Nuclearmonkee
Jun 10, 2009


Cimber posted:

BGP or DNS. Gotta be

If it was actually a “server crash” it’s going to be something hilarious, like some kind of ancient legacy system that everything else references for critical information and the whole entire thing falls over when it dies.

Nuclearmonkee
Jun 10, 2009


CitizenKain posted:

That is pretty much how we've handled the last 3 ticket platforms here. They don't want to dedicate enough people to it, or have experienced people hired for it. So we get Yet Another Half Assed system that we'll use for 8 years.

This is the part that kills these deployments.

If you are putting in a platform like ServiceNow, you need to prep the groundwork before you ever pull that trigger. If you aren't going to resource it appropriately and integrate/automate everything, don't loving buy it. Go buy some lovely small/mid sized business ticketing system that does the basics.

I also work at a place where we wasted huge amounts of $ and didn't resource the team. I just don't hook my automation into the drat thing and it's basically a manual virtual paperwork engine for the unfortunates who live in that world. If it ever gets resourced, we can automate just about anything but jesus christ that's the entire value prop for buying something like that. Self service automated everything with an audit trail.

Nuclearmonkee
Jun 10, 2009


The Iron Rose posted:

Forgot to reply to this. It costs $150/mo a month to run a dadsv5 node in AKS with 4vCPUs and 16GB of memory 24/7. Remember that these are nodes that just throwing a few kb of yaml over an HTTPS API, so you can realistically run a few dozen jobs at once on this. It takes ~1-3 minutes for a node to spin up and become ready for new pods, another 10-15s for the job to get picked up, and depending on autoscaler settings 10-30 minutes for a node to spin down. Let’s assume a 2 minute delay for each job. Finally, let’s assume each developer is paid $100/hr (which roughly translates to 192k/yr). Let’s assume that the on demand node runs on average 8 hours per day, but not a straight 9-5 since you have a team working in multiple time zones. Let’s also assume that 50% of jobs run on the on-demand node do not suffer the delay because the node is still running.

1. Monthly Cost of 24/7 Node: $150
2. Monthly Cost of On-Demand Node: $50
3. Developer Cost per Minute: $100/hr = $100/60 per minute = 1.6667 (3.3333 for 2 minutes)
4. Cost of Delay per Job for 50% of Jobs: 2 minutes * Developer cost per minute * 50%

Let J be the number of jobs run per month. The total monthly cost for on-demand would then be the sum of its base cost plus the cost due to delays, but only for half of the jobs run. We want to find the value of J where the total monthly cost of running the node 24/7 equals the total monthly cost of running it on-demand with the delays considered. Our equation becomes:

150 = 50 + (J/2) * 3.333

Simplifying:

100 = J/2 * 3.333
100 = 1.6667J
100/1.6667 = J
J ≈ 60

In other words, if you’re running more than 60 jobs a month, it becomes more cost effective to run the node 24/7. Obviously you can optimize further (for example, shutting down during hours where you have low usage), but that takes time and maintenance.

Let’s use a more generous figure and say that 80% of jobs don’t suffer the delay.

150 = 50 + (J/5) * 3.333
100 = 0.6666J
J ≈ 150

finally, let’s assume we have a 90% non-delay rate (which is very generous).

150 = 50 + (J/10) * 3.333
100 = 0.3333J
J ≈ 300

my company of ~100 devs ran 2250 jobs yesterday*. We’d need a 98.67% efficiency rate for on demand nodes to be cost effective. Obviously there’s multiple nodes involved but the ratios remain the same, and that’s before considering savings plans and reserved compute which will bring down the cost of the 24/7 node further.

It is absolutely more cost effective for us to run enough nodes constantly to ensure that most jobs don’t need to wait on the autoscaler, rather than spend expensive dev time optimizing the comparatively small compute costs. that’s also before considering how the perceptions of delay impact your company’s adoption of CI/CD pipelines (which in turn bring significant reliability, auditing, and security improvements). It’s rarely worth optimizing for compute here, unless you’re running a LLM and compute costs are 10x salary costs. 9 times out of 10 for most SaaS firms, salaries are 10x the cost of compute and you should be optimizing for the former.

Obviously not every one of those jobs will fit on the same node - we’ve got a job for training an AI model that consumes >64 GB of memory, runs ~5 times a day, and takes an hour. You betcha we spin up a dedicated node on demand for that job! But in general, it’s rare that spending the time to fine tune the compute is worth the cost in your time and in dev time.


* edit: did the math for this. Let E be the efficiency rate.

150 = 50 + (2250 * (1/E)) * 3.334
100 = 2250 * (1/E) * 3.334
E = 1 - 100/(2250*3.334)
E = 1- 0.013330667199893
E = 0.9867

In other words, it takes above 98.67% efficacy for it to be worthwhile for us.

In general, if > 30 jobs per month are delayed by 2 minutes, it becomes no longer worth it. This holds no matter how many nodes you run so long as the proportions are the same.

This is a really good breakdown and I’m stealing and adapting this for similar mindless stupidity I see over here

Nuclearmonkee
Jun 10, 2009


The Iron Rose posted:

be sure you and your devs aren’t not spending more time on optimizing/waiting than you’re saving in compute!

This is the key imo. Folks can get tunnel vision and will waste a shitton of time and energy to save on costs for something which is not even a rounding error in the big picture. If you want to save on costs, absolutely go save on costs. The average large org will have wastage measured in millions due to duplication and madness. Yes it's harder to fix those, and it's probably not your devs who need to do it so stop bothering them about trimming a thousand bucks off the next AWS bill while you drop a cool 3 million on another enterprise reporting tool that does the same thing as your other 20 million worth of enterprise reporting tools.

If you're running some giant GPU backed instance somewhere doing nothing all day, yeah fix that sure, but otherwise, don't you have better things to worry about?

No I'm not bitter. Ok maybe a little.

Nuclearmonkee fucked around with this message at 00:40 on Mar 28, 2024

Nuclearmonkee
Jun 10, 2009


FISHMANPET posted:

Honestly it was so surreal to be told by your manager that you're being let go because there isn't a need for your skills and then everybody that hears about it and then reaches out to me is like "this is so confusing, you're so valuable and helpful and crucial to so much of what we're working on".

Like,, speaking of politics. IT Leadership initiated a disastrous reorganization that did not take into account the structure or needs of the overall organization. But even on the imagined needs, they couldn't figure out how to deliver. On their own terms they couldn't deliver success, and any attempt to improve on how we delivered their vision was met with resistance.

Honestly, getting fired is one of the greatest things that's ever happened to me. I found a new job where I got a huge pay raise and my contributions are highly valued. It's such a complete 180 for me, I get whiplash sometimes from all the praise I get.

It’s incredibly rear end backwards. If you’re changing a role definition and taking highly skilled people and sticking them in there, and you identify a gap of knowledge, maybe something that didn’t even matter until now (like SDLC trivia) uhh maybe just make everybody go do a certification instead of being an rear end in a top hat?

Unless it’s just an excuse to reduce headcount in which case assholes will be assholes.

Nuclearmonkee
Jun 10, 2009


Cimber posted:

I have to wonder, all these companies that brag that they release to prod 200+ times a day, what the gently caress are they releasing so much, and how the hell does CI/CD ensure that the code changes are not causing minor fuckups that might not be noticible for a few hours/days/weeks until suddenly a major outage happens and they have to roll back to code a month old?

If your automated tests are good enough, then most of those don’t make it to prod. There’s always a hole somewhere though and there’s some risk in making changes no matter how you do it.

The more intricate tests are usually put there because of some event or another.

And 200 changes a day is just some manager looking at how many commits they average per day in their repo, not the content of those commits. Minor code fixes, “added comment”, and other crap will account for most of those. No one makes that many major application changes in one day just little things with some microservice or another.

Nuclearmonkee fucked around with this message at 14:20 on Apr 1, 2024

Nuclearmonkee
Jun 10, 2009



This will somehow involve a net increase to their pricing structure, and they will just blame the EU for it.

Nuclearmonkee
Jun 10, 2009


Thanks Ants posted:

The sight of this horrific dock should induce PTSD



Perfectly designed at perfect angles. Users will never need to adjust it which is good because they can't. :smug:

Nuclearmonkee
Jun 10, 2009


Thanks Ants posted:

It's cool when people enable things like BPDU guard but don't set a timer for turning the port back on, you end up with a 48 port switch where people assume 30 of the ports are dead.
code:
errdisable detect cause all
errdisable recovery cause all
errdisable recovery interval 30
In every switch always. I like letting people be able to fix their own poo poo when they do something foolish without having to escalate it to get a guy to shut/no shut the port.

Nuclearmonkee
Jun 10, 2009


GreenNight posted:

A lot of times cheaper than having it on a support contract.

We buy 2960x by the pallet. They're about 300 dollars each lol. MTBF is still higher on 2960x refurbs than it is on cat 9200 or 9300s.

Works perfectly fine for random campus access junk that's not super critical, like if i need 20 cameras hooked up, or some random field devices that don't stop primary process. For the critical stuff everything new gets Arista for MSS and VXLAN/BGP evpn. Yes, even at the campus level (manufacturing control systems). Layer 2 is the enemy and VXLAN is the light.

Nuclearmonkee fucked around with this message at 19:55 on Apr 15, 2024

Nuclearmonkee
Jun 10, 2009


guppy posted:

Any CLI is confusing if you aren't used to it, but Cisco's is decent and not all that confusing. It gets annoying when they have separate platforms (IOS, IOS-XE, IOS-XR, NX-OS...), which tend to be similar in syntax but different in important ways, and I despise their documentation. But generally it is perfectly fine, and nearly everyone who does networking knows their way around it because it's such a standard.

Cisco's lack of integration of their acquisitions is a real problem. Tiny fiefdoms is exactly right, working with some of that stuff is just nightmarish. Did you know there's at least one company whose primary product is a thing to make administering Call Manager less of a pain in the rear end?

If you haven't looked at it before, Arista EOS is the same on every piece of hardware, super similar to Cisco CLI, and very happily/easily integrates with your IaC management platform of choice.

If I connect into a datacenter switch with hundreds of logical ports, the syntax and commands are identical to the 12 port PoE guy we threw into a dirty cabinet. All that's different is the hardware capacity and feature capability, which is limited on lower tier hardware simply due to the lack of cpu/mem or whatever.

It's just better.

Nuclearmonkee
Jun 10, 2009


mllaneza posted:

I did that by screwing up an iptables rules update. While remote. On a Saturday.

Sometimes in interviews I talk about my experience in terms of mistakes. "Unix? Let's see, I've done an rm -rf * at the root level of a machine, etc., etc." That should show that you have hands on experience in the real world, and you're not just parroting exam material.

A lot of times people are hesitant to answer “what was your worst mistake that impacted production?” so I have to tell them the time I took down a courthouse in the middle of the day to get a real answer.

I like it as a question because good answers get down to the real question which is “and what did you learn from that awful experience?” and you can talk about stupid technology that you have to build controls around to protect it from falling over too easily.

If they have no answer or a fake one, then I know they’re either full of poo poo or they don’t actually do anything. Everybody has at least one mistake, though maybe the impact may not have been severe if you only worked at places with excellent control and deployment practices.

Nuclearmonkee
Jun 10, 2009


CommieGIR posted:

I dropped a production database during an outage during my first ever mainframe job, thankfully my boss was a kind soul and I got to learn how to restore from backups.

Thought my career was over right then and there

Only crappy tyrants would fire someone for making an honest mistake that can be chalked up to:

1) lack of training
2) lack of documentation
3) lack of process control
4) piece of poo poo computer have bug

If you go outside of a process and cause an outage out of negligence when you knew better, that's when the serious discussions are had.

Nuclearmonkee
Jun 10, 2009


Potato Salad posted:

If the acquisition is so marginal that they can't even afford recabling contractors, why the gently caress are they doing the acquisition in the first place?

This is my first thought.

If someone gave me that schedule during an acquisition, I would tell them “No it’s impossible with the resources. We either have to bring in contracted labor or other additional resources to get some of this done or it will be a failure. If you don’t believe me, fire me and hire someone else.”

And that’s with us zero touch deploying every switch we put on the network these days.

If any kind of due diligence was undertaken, they should’ve known that this was required before anyone ever signed on the bottom line, or it was a contingency budgeted line item if it was acquired sight unseen without discovery. Anything else is mismanagement from whoever is doing the acquisition.

Nuclearmonkee
Jun 10, 2009


BaseballPCHiker posted:

It feels like once you've been in IT long enough you know enough to figure anything out given enough time.

Except for regex. gently caress that poo poo.

yep. regexr is your friend.

Nuclearmonkee
Jun 10, 2009


Cenodoxus posted:

In every large company on earth, there's an invisible magic line on the org chart that acts as an impenetrable barrier to any form of consequences. Most decision-making authority resides well above it.

I hate how true this is.

Nuclearmonkee
Jun 10, 2009


Wibla posted:

We do. Invaluable tool for troubleshooting and network/subnet planning.

Same, we also have an integrated in the automation system for onboarding devices.

User managed passwords are fake security. I just consider whatever it is to be insecure if that’s the only factor used for authentication. If it’s for some system where I care about the password, then it’s gonna be a gigantic unique string of trash that gets pasted in from a vault, which I also do for my own personal stuff. The only password I know is the one to get into the password vault which is of course two factor.

Nuclearmonkee
Jun 10, 2009


Thanks Ants posted:

I feel like advancing through your career is mainly shifting what "diving in" looks like - in the early days it might be trying to martyr yourself but after a decade or so it will be "here are the three companies we need to do a good job of this, who is approving this budget?".

Or at least always keeping an eye towards “ok now that we understand how this works, can we configure this pile of poo poo to work and be maintainable in a way that we can document it and hand it to ops/users”. And if we don’t have the knowledge/experience, do we need to hire outside help? Key is to not build half baked poo poo and throw it in prod without thinking.

Being a martyr sucks real bad unless you like 3am phone calls.

Nuclearmonkee
Jun 10, 2009


Internet Explorer posted:

Agreed for sure. I feel like a major difference between a more junior engineer and a more senior engineer is learning when to say "no, this is a bad idea, let's talk about alternative approaches" instead of just rolling up your sleeves.

I was out on PTO for a few days and am having one of those post time off meltdowns. I hate this poo poo, throw it all in the trash.

If you still like technology, it means you haven't worked in IT long enough.

Adbot
ADBOT LOVES YOU

Nuclearmonkee
Jun 10, 2009


vanity slug posted:

i love suffering

This is prime IT worker material right here.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply