Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
GreenNight
Feb 19, 2006
Turning the light on the darkest places, you and I know we got to face this now. We got to face this now.

I had to explain to someone that yes the new Teams has problems and no I can’t fix them. Go ahead and open a ticket with Microsoft because I’m not.

Adbot
ADBOT LOVES YOU

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:
i just spent four and a half hours in a teams call to deploy a single third party helm chart with 8 lines of values to an aks gov cluster (i.e. we can't execute helm apply from a gitlab job).

this is just the most ridiculously complicated and overengineered CI/CD system that I've ever had the misfortune to lay my eyes upon. it's not even the gov part that's loving me here! this might be the worst way to build a pipeline I've ever seen. We've got kustomize, and flux, and crossplane, and helm jinja templating all running at the same time. every commit requires me to look at pipelines in three repositories. the CI system requires a new k8s node to be spun up for every job because of "cost control" (the 5 minutes this takes is literally more expensive than running a node 24/7).

I even had the engineer who built it all on the call with me to guide me through it :negative:


I'm about at the point where i just want to say gently caress it and have every deployment push a new copy of the repo to an azure storage bucket and trigger an event-driven azure function to run terraform/helm diff/apply.

Dirt Road Junglist
Oct 8, 2010

We will be cruel
And through our cruelty
They will know who we are
When u need an automation 2 run ur automation...that sounds hellish.

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
Stop doing CI/CD

The Iron Rose posted:

We've got kustomize, and flux, and crossplane, and helm jinja templating all running at the same time. every commit requires me to look at pipelines in three repositories. the CI system requires a new k8s node to be spun up for every job because of "cost control" (the 5 minutes this takes is literally more expensive than running a node 24/7).

LOOK at what programmers have been demanding your Respect for all this time!

The Iron Rose posted:

I'm about at the point where i just want to say gently caress it and have every deployment push a new copy of the repo to an azure storage bucket and trigger an event-driven azure function to run terraform/helm diff/apply.

Statements dreamt up by the utterly deranged.

They have played your for an absolute fool.

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug
The fun part about CI/CD is the fact that all these devs argue that containers and all this stuff makes it easier to dev, and I'm wondering what they are smoking and how I get some?

Hell, microservices is one big nightmare there a monolith app looks simple in comparison.

Cimber
Feb 3, 2014
CI/CD is a great buzzword for managers, but to implement it properly where it gives you actual benefits requires a lot of buy in from a lot of teams, and if one team isn't enthusiastic about it the whole house of cards falls down.

Now, why does k8 spooling up for CI/CD cost so much money? The entire point of K8 is to be able to create and destroy infrastructure on demand to SAVE money.

Ihmemies
Oct 6, 2012

CLAM DOWN posted:

wtf, Teams screenshare has always worked flawlessly for me, what have you/your org done

Teams screen sharing crashes silently if you have a Hdr monitor and want to use its capabilities. I have to manually disable hdr every time I share my screen.

johnny park
Sep 15, 2009

Antigravitas posted:

Stop doing CI/CD

LOOK at what programmers have been demanding your Respect for all this time!

Statements dreamt up by the utterly deranged.

They have played your for an absolute fool.

:emptyquote:

The Fool
Oct 16, 2003


Cimber posted:

CI/CD is a great buzzword for managers, but to implement it properly where it gives you actual benefits requires a lot of buy in from a lot of teams, and if one team isn't enthusiastic about it the whole house of cards falls down.
I don't like this take but I lack the wherewithal to refute it

quote:

Now, why does k8 spooling up for CI/CD cost so much money? The entire point of K8 is to be able to create and destroy infrastructure on demand to SAVE money.

the vast majority of teams using k8s today are misusing it and their needs would be far better served by a different service or technology

The Fool
Oct 16, 2003


Antigravitas posted:

Stop doing CI/CD

LOOK at what programmers have been demanding your Respect for all this time!

Statements dreamt up by the utterly deranged.

They have played your for an absolute fool.

only issue I have with this is that most application developers he hate cicd too

The Fool
Oct 16, 2003


otherwise
:perfect:

CommieGIR
Aug 22, 2006

The blue glow is a feature, not a bug


Pillbug

The Fool posted:

the vast majority of teams using k8s today are misusing it and their needs would be far better served by a different service or technology

Pretty much.

K8s: "Check out this cool tech stack, its for specific use cases!"

Devs: "Let's use it for EVERYTHING."

Cimber
Feb 3, 2014

CommieGIR posted:

Pretty much.

K8s: "Check out this cool tech stack, its for specific use cases!"

Devs: "Let's use it for EVERYTHING."

I need webservers quickly because its Amazon Prime day! K8 to the rescue!
I need more auth servers quickly because its Amazon Prime day? K8 to the rescue!
I need more database space quickly because its Amazon Prime day? DON'T EVEN loving THINK ABOUT USING K8.

Wizard of the Deep
Sep 25, 2005

Another productive workday
CI/CD Pipelines are the worst software deployment method except for all those other forms that have been tried from time to time.

Zorak of Michigan
Jun 10, 2006

The Iron Rose posted:

i just spent four and a half hours in a teams call to deploy a single third party helm chart with 8 lines of values to an aks gov cluster (i.e. we can't execute helm apply from a gitlab job).

this is just the most ridiculously complicated and overengineered CI/CD system that I've ever had the misfortune to lay my eyes upon. it's not even the gov part that's loving me here! this might be the worst way to build a pipeline I've ever seen. We've got kustomize, and flux, and crossplane, and helm jinja templating all running at the same time. every commit requires me to look at pipelines in three repositories. the CI system requires a new k8s node to be spun up for every job because of "cost control" (the 5 minutes this takes is literally more expensive than running a node 24/7).

I even had the engineer who built it all on the call with me to guide me through it :negative:


I'm about at the point where i just want to say gently caress it and have every deployment push a new copy of the repo to an azure storage bucket and trigger an event-driven azure function to run terraform/helm diff/apply.

There's a Scrubs episode where JD means the most calm, collected, competent doctor he's ever worked with. Every time a patient has a new symptom or complication, he thinks for a second, then says, "No problem, we'll just xxxx." Everyone thinks he's amazing. Then finally he gets a patient who just keeps getting worse and worse and none of the solutions the doctor pitches stops his condition's progression. When the patient finally dies, the doctor has a breakdown and quits medicine. What you described sounds like the same guy took up working on CI/CD pipelines. "Helm isn't doing what we want? No problem, we'll use jinja templating."

bull3964
Nov 18, 2000

DO YOU HEAR THAT? THAT'S THE SOUND OF ME PATTING MYSELF ON THE BACK.


Yeah, when we retired our on prem docker swarm cluster and moved the workloads to Azure, so may devs were requesting AKS. "You have a front end web container and an API container that you've run at a static number of instances for the past 3 years. You're going to an App Service. You don't need AKS."

Sepist
Dec 26, 2005

FUCK BITCHES, ROUTE PACKETS

Gravy Boat 2k
I like kubernetes

IUG
Jul 14, 2007


CommieGIR posted:

Pretty much.

K8s: "Check out this cool tech stack, its for specific use cases!"

Devs: "Let's use it for EVERYTHING."

Don’t dox my boss.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Zorak of Michigan posted:

There's a Scrubs episode where JD means the most calm, collected, competent doctor he's ever worked with. Every time a patient has a new symptom or complication, he thinks for a second, then says, "No problem, we'll just xxxx." Everyone thinks he's amazing. Then finally he gets a patient who just keeps getting worse and worse and none of the solutions the doctor pitches stops his condition's progression. When the patient finally dies, the doctor has a breakdown and quits medicine. What you described sounds like the same guy took up working on CI/CD pipelines. "Helm isn't doing what we want? No problem, we'll use jinja templating."
This was like the second episode too, if anyone is wondering about the universality of this experience

Cimber posted:

Now, why does k8 spooling up for CI/CD cost so much money? The entire point of K8 is to be able to create and destroy infrastructure on demand to SAVE money.
Fargate allergy

The entire point of K8s is that it simplifies internal platforms, by making it so all your platform has to do is build up a declarative config instead of orchestrating changes across a pile of disparate systems. You're not injecting dozens of people's code or requirements into the deployment system anymore because everything is handled by controllers that are deployed as actual services, and you can delegate ownership of each controller to whoever is supposed to own that slice of functionality. Every control loop is something you can monitor and operationalize, and if a deployment is interrupted partway through because of an intermediate service malfunction, fixing the service will result in the deployment getting completed. Failovers and partial retries are automatically built into every single step of your release process.

Then, without fail, every single company that's new to Kubernetes tries to build an internal platform that rips all this poo poo out and tries to reinvent loving Ansible. We don't want to learn about these async control loops; let's build in a ton of waiters for everything. It's too complicated having all these little controllers around; let's build a big deployment pipeline on Jenkins that assembles everything out of fragments of our bespoke DSL. Voila! I've built loving garbage! My thing sucks! Kubernetes is too hard!

Vulture Culture fucked around with this message at 16:30 on Mar 27, 2024

Internet Explorer
Jun 1, 2005





Vulture Culture posted:

Voila! I've built loving garbage!

look I'm really trying, okay

Cimber
Feb 3, 2014
Configuration as code and infrastructure as code are good things.

The Fool
Oct 16, 2003


Vulture Culture posted:

let's build a big deployment pipeline on Jenkins that assembles everything out of fragments of our bespoke DSL. Voila! I've built loving garbage! My thing sucks!

replace jenkins with ado pipelines and you've described our legacy environments to a t

Diqnol
May 10, 2010

Vulture Culture posted:

The entire point of K8s is that it simplifies internal platforms, by making it so all your platform has to do is build up a declarative config instead of orchestrating changes across a pile of disparate systems. You're not injecting dozens of people's code or requirements into the deployment system anymore because everything is handled by controllers that are deployed as actual services, and you can delegate ownership of each controller to whoever is supposed to own that slice of functionality. Every control loop is something you can monitor and operationalize, and if a deployment is interrupted partway through because of an intermediate service malfunction, fixing the service will result in the deployment getting completed. Failovers and partial retries are automatically built into every single step of your release process.

Then, without fail, every single company that's new to Kubernetes tries to build an internal platform that rips all this poo poo out and tries to reinvent loving Ansible. We don't want to learn about these async control loops; let's build in a ton of waiters for everything. It's too complicated having all these little controllers around; let's build a big deployment pipeline on Jenkins that assembles everything out of fragments of our bespoke DSL. Voila! I've built loving garbage! My thing sucks! Kubernetes is too hard!

lmao this post owns

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Vulture Culture posted:

This was like the second episode too, if anyone is wondering about the universality of this experience

Fargate allergy

The entire point of K8s is that it simplifies internal platforms, by making it so all your platform has to do is build up a declarative config instead of orchestrating changes across a pile of disparate systems. You're not injecting dozens of people's code or requirements into the deployment system anymore because everything is handled by controllers that are deployed as actual services, and you can delegate ownership of each controller to whoever is supposed to own that slice of functionality. Every control loop is something you can monitor and operationalize, and if a deployment is interrupted partway through because of an intermediate service malfunction, fixing the service will result in the deployment getting completed. Failovers and partial retries are automatically built into every single step of your release process.

Then, without fail, every single company that's new to Kubernetes tries to build an internal platform that rips all this poo poo out and tries to reinvent loving Ansible. We don't want to learn about these async control loops; let's build in a ton of waiters for everything. It's too complicated having all these little controllers around; let's build a big deployment pipeline on Jenkins that assembles everything out of fragments of our bespoke DSL. Voila! I've built loving garbage! My thing sucks! Kubernetes is too hard!

this is very well said, and perfectly describes the CI system that I worked with yesterday! The CI pipeline we use for our other services (and the ~40 clusters they deploy to in various combinations) is less than a dozen lines of code, handles AKS/EKS/GKE, and shockingly it’s way better!

Also I like k8s a lot but it makes little sense if you’re not building a dozen or more services orchestrated among multiple teams, where you really benefit from a shared lingua franca and the accompanying tooling for build/release/o11y. It really pays dividends in multicloud environments too, which describes most companies nowadays. Startups under 100 people should almost always use one of the various PaaS offerings (fargate, app engine, etc) which are much simpler to manage and usually more cost effective in raw compute terms.

The Iron Rose fucked around with this message at 18:10 on Mar 27, 2024

Kibner
Oct 21, 2008

Acguy Supremacy

The Fool posted:

Yeah, I've started making a point of calling out ( in public) when other engineers do something good. Even for senior and above. So often people only hear about the negative things that happen, need to balance that out a bit.

My small company (~30 people) has a 30 minute company meeting every other week and some time at the end is reserved for a Kudos Corner where people are encouraged to speak up about the good things their co-workers are doing. It does feel good.

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Cimber posted:


Now, why does k8 spooling up for CI/CD cost so much money? The entire point of K8 is to be able to create and destroy infrastructure on demand to SAVE money.

Forgot to reply to this. It costs $150/mo a month to run a dadsv5 node in AKS with 4vCPUs and 16GB of memory 24/7. Remember that these are nodes that just throwing a few kb of yaml over an HTTPS API, so you can realistically run a few dozen jobs at once on this. It takes ~1-3 minutes for a node to spin up and become ready for new pods, another 10-15s for the job to get picked up, and depending on autoscaler settings 10-30 minutes for a node to spin down. Let’s assume a 2 minute delay for each job. Finally, let’s assume each developer is paid $100/hr (which roughly translates to 192k/yr). Let’s assume that the on demand node runs on average 8 hours per day, but not a straight 9-5 since you have a team working in multiple time zones. Let’s also assume that 50% of jobs run on the on-demand node do not suffer the delay because the node is still running.

1. Monthly Cost of 24/7 Node: $150
2. Monthly Cost of On-Demand Node: $50
3. Developer Cost per Minute: $100/hr = $100/60 per minute = 1.6667 (3.3333 for 2 minutes)
4. Cost of Delay per Job for 50% of Jobs: 2 minutes * Developer cost per minute * 50%

Let J be the number of jobs run per month. The total monthly cost for on-demand would then be the sum of its base cost plus the cost due to delays, but only for half of the jobs run. We want to find the value of J where the total monthly cost of running the node 24/7 equals the total monthly cost of running it on-demand with the delays considered. Our equation becomes:

150 = 50 + (J/2) * 3.333

Simplifying:

100 = J/2 * 3.333
100 = 1.6667J
100/1.6667 = J
J ≈ 60

In other words, if you’re running more than 60 jobs a month, it becomes more cost effective to run the node 24/7. Obviously you can optimize further (for example, shutting down during hours where you have low usage), but that takes time and maintenance.

Let’s use a more generous figure and say that 80% of jobs don’t suffer the delay.

150 = 50 + (J/5) * 3.333
100 = 0.6666J
J ≈ 150

finally, let’s assume we have a 90% non-delay rate (which is very generous).

150 = 50 + (J/10) * 3.333
100 = 0.3333J
J ≈ 300

my company of ~100 devs ran 2250 jobs yesterday*. We’d need a 98.67% efficiency rate for on demand nodes to be cost effective. Obviously there’s multiple nodes involved but the ratios remain the same, and that’s before considering savings plans and reserved compute which will bring down the cost of the 24/7 node further.

It is absolutely more cost effective for us to run enough nodes constantly to ensure that most jobs don’t need to wait on the autoscaler, rather than spend expensive dev time optimizing the comparatively small compute costs. that’s also before considering how the perceptions of delay impact your company’s adoption of CI/CD pipelines (which in turn bring significant reliability, auditing, and security improvements). It’s rarely worth optimizing for compute here, unless you’re running a LLM and compute costs are 10x salary costs. 9 times out of 10 for most SaaS firms, salaries are 10x the cost of compute and you should be optimizing for the former.

Obviously not every one of those jobs will fit on the same node - we’ve got a job for training an AI model that consumes >64 GB of memory, runs ~5 times a day, and takes an hour. You betcha we spin up a dedicated node on demand for that job! But in general, it’s rare that spending the time to fine tune the compute is worth the cost in your time and in dev time.


* edit: did the math for this. Let E be the efficiency rate.

150 = 50 + (2250 * (1/E)) * 3.334
100 = 2250 * (1/E) * 3.334
E = 1 - 100/(2250*3.334)
E = 1- 0.013330667199893
E = 0.9867

In other words, it takes above 98.67% efficacy for it to be worthwhile for us.

In general, if > 30 jobs per month are delayed by 2 minutes, it becomes no longer worth it. This holds no matter how many nodes you run so long as the proportions are the same.

The Iron Rose fucked around with this message at 19:36 on Mar 27, 2024

Handsome Ralph
Sep 3, 2004

Oh boy, posting!
That's where I'm a Viking!


So I'm new to the IT field after spending 12 years in academia/editorial type work. It's been a huge breath of fresh air, and I love it for the most part. Definitely some downsides but not nearly as many as there were in my old field. I've been in my current/first IT role for about 11 months now. But I'm in a weird position, and not really sure what to do next. Just venting/seeking guidance at this point.

I'm currently a desktop support tech. I do a lot of everything (imaging, break/fix, AD stuff, software support, account creation, basic networking, etc) which is nice, but it's been extremely slow as of late (like 2-4 tickets a day slow). I got my CCNA late last year because I want to pivot into networking. My boss is extremely supportive of it. After I got my cert, he got me into our former corporate structures netops meetings so I could at least observe and potentially start helping with basic updates and what not. That lasted all of about 2.5 months.

Problem is, we underwent a corporate acquisition that was completed at the end of last year. My division and it's IT department went untouched (being stupidly profitable will do that for you, I guess), but my most of my original corporate overlords IT department was purged at the beginning of this month, including most of our networking and sysadmin teams. So no more netops meetings, and we have only have one network engineer now who isn't local, and pretty busy just keeping everything running all by himself. My manager has requested at least twice now that I be given access to some monitoring systems as well as the logins for some of our on prem appliances so I can, if need be, hop in and do basic updates, etc. But nothing has happened in the 3 weeks since we've repeatedly asked. We've chalked it up to the remaining sysadmins that have domain over this stuff basically being under the gun and being too busy dealing with other stuff to get around to it, but it's still pretty annoying.

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.

chocolateTHUNDER
Jul 19, 2008

GIVE ME ALL YOUR FREE AGENTS

ALL OF THEM

Handsome Ralph posted:

So I'm new to the IT field after spending 12 years in academia/editorial type work. It's been a huge breath of fresh air, and I love it for the most part. Definitely some downsides but not nearly as many as there were in my old field. I've been in my current/first IT role for about 11 months now. But I'm in a weird position, and not really sure what to do next. Just venting/seeking guidance at this point.

I'm currently a desktop support tech. I do a lot of everything (imaging, break/fix, AD stuff, software support, account creation, basic networking, etc) which is nice, but it's been extremely slow as of late (like 2-4 tickets a day slow). I got my CCNA late last year because I want to pivot into networking. My boss is extremely supportive of it. After I got my cert, he got me into our former corporate structures netops meetings so I could at least observe and potentially start helping with basic updates and what not. That lasted all of about 2.5 months.

Problem is, we underwent a corporate acquisition that was completed at the end of last year. My division and it's IT department went untouched (being stupidly profitable will do that for you, I guess), but my most of my original corporate overlords IT department was purged at the beginning of this month, including most of our networking and sysadmin teams. So no more netops meetings, and we have only have one network engineer now who isn't local, and pretty busy just keeping everything running all by himself. My manager has requested at least twice now that I be given access to some monitoring systems as well as the logins for some of our on prem appliances so I can, if need be, hop in and do basic updates, etc. But nothing has happened in the 3 weeks since we've repeatedly asked. We've chalked it up to the remaining sysadmins that have domain over this stuff basically being under the gun and being too busy dealing with other stuff to get around to it, but it's still pretty annoying.

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.

It's your first IT job, and you're about to be there a year. If this is how things still are in a month or two, it's time to polish up the resume and look for another place. It sounds like with the skills you have, you would be a great hire.

Onward and upward!

Zorak of Michigan
Jun 10, 2006

Handsome Ralph posted:

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.

This is the best time to look for work, because you want it but don't need it. Put resumes out, take interviews (without telling anyone in your current job that you're doing so) and see what develops. If you don't get offers, or you get offers you don't want, no worries, you have a job you can live with already! If you get an offer for something better, congratulations, you move up in the world. The only downside is that you have to put the work into applying and interviewing.

MF_James
May 8, 2008
I CANNOT HANDLE BEING CALLED OUT ON MY DUMBASS OPINIONS ABOUT ANTI-VIRUS AND SECURITY. I REALLY LIKE TO THINK THAT I KNOW THINGS HERE

INSTEAD I AM GOING TO WHINE ABOUT IT IN OTHER THREADS SO MY OPINION CAN FEEL VALIDATED IN AN ECHO CHAMBER I LIKE

Handsome Ralph posted:

So I'm new to the IT field after spending 12 years in academia/editorial type work. It's been a huge breath of fresh air, and I love it for the most part. Definitely some downsides but not nearly as many as there were in my old field. I've been in my current/first IT role for about 11 months now. But I'm in a weird position, and not really sure what to do next. Just venting/seeking guidance at this point.

I'm currently a desktop support tech. I do a lot of everything (imaging, break/fix, AD stuff, software support, account creation, basic networking, etc) which is nice, but it's been extremely slow as of late (like 2-4 tickets a day slow). I got my CCNA late last year because I want to pivot into networking. My boss is extremely supportive of it. After I got my cert, he got me into our former corporate structures netops meetings so I could at least observe and potentially start helping with basic updates and what not. That lasted all of about 2.5 months.

Problem is, we underwent a corporate acquisition that was completed at the end of last year. My division and it's IT department went untouched (being stupidly profitable will do that for you, I guess), but my most of my original corporate overlords IT department was purged at the beginning of this month, including most of our networking and sysadmin teams. So no more netops meetings, and we have only have one network engineer now who isn't local, and pretty busy just keeping everything running all by himself. My manager has requested at least twice now that I be given access to some monitoring systems as well as the logins for some of our on prem appliances so I can, if need be, hop in and do basic updates, etc. But nothing has happened in the 3 weeks since we've repeatedly asked. We've chalked it up to the remaining sysadmins that have domain over this stuff basically being under the gun and being too busy dealing with other stuff to get around to it, but it's still pretty annoying.

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.

Study for more stuff, keep pushing to get some of the poo poo work from the teams above (network or systems), update your resume; if nothing happens over the next month or two, start spraying that resume around.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Handsome Ralph posted:

So I'm new to the IT field after spending 12 years in academia/editorial type work. It's been a huge breath of fresh air, and I love it for the most part. Definitely some downsides but not nearly as many as there were in my old field. I've been in my current/first IT role for about 11 months now. But I'm in a weird position, and not really sure what to do next. Just venting/seeking guidance at this point.

I'm currently a desktop support tech. I do a lot of everything (imaging, break/fix, AD stuff, software support, account creation, basic networking, etc) which is nice, but it's been extremely slow as of late (like 2-4 tickets a day slow). I got my CCNA late last year because I want to pivot into networking. My boss is extremely supportive of it. After I got my cert, he got me into our former corporate structures netops meetings so I could at least observe and potentially start helping with basic updates and what not. That lasted all of about 2.5 months.

Problem is, we underwent a corporate acquisition that was completed at the end of last year. My division and it's IT department went untouched (being stupidly profitable will do that for you, I guess), but my most of my original corporate overlords IT department was purged at the beginning of this month, including most of our networking and sysadmin teams. So no more netops meetings, and we have only have one network engineer now who isn't local, and pretty busy just keeping everything running all by himself. My manager has requested at least twice now that I be given access to some monitoring systems as well as the logins for some of our on prem appliances so I can, if need be, hop in and do basic updates, etc. But nothing has happened in the 3 weeks since we've repeatedly asked. We've chalked it up to the remaining sysadmins that have domain over this stuff basically being under the gun and being too busy dealing with other stuff to get around to it, but it's still pretty annoying.

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.
IT is collaborative work. There's a limit to how much you can upskill when you're upskilling in a personal vacuum. If you have really great empathy skills, you can invent enough imaginary friends to build for that you're still constructing something that looks relatively real-world. But as you've observed, one of the principal challenges of the job is creating technology environments that are resilient to acquisitions, mergers, and reorgs. There's really only one way to do that. Weigh that against your other benefits and drawbacks.

Nuclearmonkee
Jun 10, 2009


The Iron Rose posted:

Forgot to reply to this. It costs $150/mo a month to run a dadsv5 node in AKS with 4vCPUs and 16GB of memory 24/7. Remember that these are nodes that just throwing a few kb of yaml over an HTTPS API, so you can realistically run a few dozen jobs at once on this. It takes ~1-3 minutes for a node to spin up and become ready for new pods, another 10-15s for the job to get picked up, and depending on autoscaler settings 10-30 minutes for a node to spin down. Let’s assume a 2 minute delay for each job. Finally, let’s assume each developer is paid $100/hr (which roughly translates to 192k/yr). Let’s assume that the on demand node runs on average 8 hours per day, but not a straight 9-5 since you have a team working in multiple time zones. Let’s also assume that 50% of jobs run on the on-demand node do not suffer the delay because the node is still running.

1. Monthly Cost of 24/7 Node: $150
2. Monthly Cost of On-Demand Node: $50
3. Developer Cost per Minute: $100/hr = $100/60 per minute = 1.6667 (3.3333 for 2 minutes)
4. Cost of Delay per Job for 50% of Jobs: 2 minutes * Developer cost per minute * 50%

Let J be the number of jobs run per month. The total monthly cost for on-demand would then be the sum of its base cost plus the cost due to delays, but only for half of the jobs run. We want to find the value of J where the total monthly cost of running the node 24/7 equals the total monthly cost of running it on-demand with the delays considered. Our equation becomes:

150 = 50 + (J/2) * 3.333

Simplifying:

100 = J/2 * 3.333
100 = 1.6667J
100/1.6667 = J
J ≈ 60

In other words, if you’re running more than 60 jobs a month, it becomes more cost effective to run the node 24/7. Obviously you can optimize further (for example, shutting down during hours where you have low usage), but that takes time and maintenance.

Let’s use a more generous figure and say that 80% of jobs don’t suffer the delay.

150 = 50 + (J/5) * 3.333
100 = 0.6666J
J ≈ 150

finally, let’s assume we have a 90% non-delay rate (which is very generous).

150 = 50 + (J/10) * 3.333
100 = 0.3333J
J ≈ 300

my company of ~100 devs ran 2250 jobs yesterday*. We’d need a 98.67% efficiency rate for on demand nodes to be cost effective. Obviously there’s multiple nodes involved but the ratios remain the same, and that’s before considering savings plans and reserved compute which will bring down the cost of the 24/7 node further.

It is absolutely more cost effective for us to run enough nodes constantly to ensure that most jobs don’t need to wait on the autoscaler, rather than spend expensive dev time optimizing the comparatively small compute costs. that’s also before considering how the perceptions of delay impact your company’s adoption of CI/CD pipelines (which in turn bring significant reliability, auditing, and security improvements). It’s rarely worth optimizing for compute here, unless you’re running a LLM and compute costs are 10x salary costs. 9 times out of 10 for most SaaS firms, salaries are 10x the cost of compute and you should be optimizing for the former.

Obviously not every one of those jobs will fit on the same node - we’ve got a job for training an AI model that consumes >64 GB of memory, runs ~5 times a day, and takes an hour. You betcha we spin up a dedicated node on demand for that job! But in general, it’s rare that spending the time to fine tune the compute is worth the cost in your time and in dev time.


* edit: did the math for this. Let E be the efficiency rate.

150 = 50 + (2250 * (1/E)) * 3.334
100 = 2250 * (1/E) * 3.334
E = 1 - 100/(2250*3.334)
E = 1- 0.013330667199893
E = 0.9867

In other words, it takes above 98.67% efficacy for it to be worthwhile for us.

In general, if > 30 jobs per month are delayed by 2 minutes, it becomes no longer worth it. This holds no matter how many nodes you run so long as the proportions are the same.

This is a really good breakdown and I’m stealing and adapting this for similar mindless stupidity I see over here

Handsome Ralph
Sep 3, 2004

Oh boy, posting!
That's where I'm a Viking!


Appreciate the feedback guys, looks like my gut was right. Thanks all :)

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Nuclearmonkee posted:

This is a really good breakdown and I’m stealing and adapting this for similar mindless stupidity I see over here
I agree with the big sentiment, but disagree that auto-scaling is an inherently riskier approach. If you have even moderate swings in usage between peak and non-peak times, it's neither time-consuming nor difficult to host warm compute capacity to ensure that some but not too much capacity remains free to take new jobs. We use this approach everywhere, but especially on our heavier-weight batch environments (remote IDEs and exploratory data tools).

The typical way of implementing this is to use placeholder pod replicas with some resource requests provided, and give them a negative priority. That causes this to happen:

  • A new pod is scheduled that exceeds the free contiguous resources on the cluster
  • Instead of the node autoscaler spinning up a new node, the placeholder pod is preempted and terminated
  • The new pod is scheduled immediately using the resources freed from terminating the placeholder
  • The placeholder, as part of a deployment/replica set, is recreated so the cluster runs the right number of replicas
  • The creation of the placeholder pod, not the real workload, triggers the autoscaler to add a warm node

You can scale the number of placeholder replicas according to some other workload, or as a percentage of total cluster/node group capacity, by using something like a horizontal pod autoscaler to manage the replica count.

On top of this, you can significantly reduce spinup times by
  • Using an alternative autoscaler like Karpenter that uses pod scheduling requests rather than pod scheduling failures to drive spin-up of new capacity
  • Using a lightweight VM like Bottlerocket, which can drop E2E instance requested to node ready time from 2 minutes down to 40 seconds

Vulture Culture fucked around with this message at 20:16 on Mar 27, 2024

The Iron Rose
May 12, 2012

:minnie: Cat Army :minnie:

Vulture Culture posted:

I agree with the big sentiment, but disagree that auto-scaling is an inherently riskier approach. If you have even moderate swings in usage between peak and non-peak times, it's neither time-consuming nor difficult to host warm compute capacity to ensure that some but not too much capacity remains free to take new jobs. We use this approach everywhere, but especially on our heavier-weight batch environments (remote IDEs and exploratory data tools).

The typical way of implementing this is to use placeholder pod replicas with some resource requests provided, and give them a negative priority. That causes this to happen:

  • A new pod is scheduled that exceeds the free contiguous resources on the cluster
  • Instead of the node autoscaler spinning up a new node, the placeholder pod is preempted and terminated
  • The new pod is scheduled immediately using the resources freed from terminating the placeholder
  • The placeholder, as part of a deployment/replica set, is recreated so the cluster runs the right number of replicas
  • The creation of the placeholder pod, not the real workload, triggers the autoscaler to add a warm node

You can scale the number of placeholder replicas according to some other workload, or as a percentage of total cluster/node group capacity, by using something like a horizontal pod autoscaler to manage the replica count.

On top of this, you can significantly reduce spinup times by
  • Using an alternative autoscaler like Karpenter that uses pod scheduling requests rather than pod scheduling failures to drive spin-up of new capacity
  • Using a lightweight VM like Bottlerocket, which can drop E2E instance requested to node ready time from 2 minutes down to 40 seconds

I wouldn’t say that autoscaling is a *riskier* approach - and we absolutely use autoscaling especially as many jobs (e.g. docker builds, code compilation, gosec scans, etc) often need dramatically more resources and often result in a scaling event. The math above is also making a bunch of assumptions, and you should really be treating classes of jobs that require extra resources such that you’re spinning up nodes for that job as separate. For example, our AI/ML jobs run less than 30x a day, and need their own node, so we do autoscale those and eat the delay.

At the end of the day it all depends on the workloads you’re running, the frequency of those workloads, and their resource demands. Still, though it can be worthwhile to optimize for compute costs in many situations, be sure you and your devs aren’t not spending more time on optimizing/waiting than you’re saving in compute!

The Chad Jihad
Feb 24, 2007


We need a new app push built. I get sent the file. It appears to be the entire installed application's folder, and a .config, in a zip file. I ask, "Is there an installer? This appears to be a zip, is the intention to just extract it and copy over the config?" they reply "Yes these are the raw files, you will add these to a new folder in Program Files and create links for the user to launch it."

S...surely this isn't best practice? Am I the one out of the loop? Jank-rear end install script, jank-rear end detection method, jank-rear end uninstall script?

Antigravitas
Dec 8, 2019

Die Rettung fuer die Landwirte:
Making an MSI isn't that hard. I had to author a bunch when I still had to admin Windows.

WiX isn't pretty, and making shortcuts that actually get removed after uninstall is dark magic, but you can actually build something that behaves like it was made by people who know what they are doing…

guppy
Sep 21, 2004

sting like a byob

Handsome Ralph posted:

So I'm new to the IT field after spending 12 years in academia/editorial type work. It's been a huge breath of fresh air, and I love it for the most part. Definitely some downsides but not nearly as many as there were in my old field. I've been in my current/first IT role for about 11 months now. But I'm in a weird position, and not really sure what to do next. Just venting/seeking guidance at this point.

I'm currently a desktop support tech. I do a lot of everything (imaging, break/fix, AD stuff, software support, account creation, basic networking, etc) which is nice, but it's been extremely slow as of late (like 2-4 tickets a day slow). I got my CCNA late last year because I want to pivot into networking. My boss is extremely supportive of it. After I got my cert, he got me into our former corporate structures netops meetings so I could at least observe and potentially start helping with basic updates and what not. That lasted all of about 2.5 months.

Problem is, we underwent a corporate acquisition that was completed at the end of last year. My division and it's IT department went untouched (being stupidly profitable will do that for you, I guess), but my most of my original corporate overlords IT department was purged at the beginning of this month, including most of our networking and sysadmin teams. So no more netops meetings, and we have only have one network engineer now who isn't local, and pretty busy just keeping everything running all by himself. My manager has requested at least twice now that I be given access to some monitoring systems as well as the logins for some of our on prem appliances so I can, if need be, hop in and do basic updates, etc. But nothing has happened in the 3 weeks since we've repeatedly asked. We've chalked it up to the remaining sysadmins that have domain over this stuff basically being under the gun and being too busy dealing with other stuff to get around to it, but it's still pretty annoying.

Anyways, I'm not sure really what to do next. I've been keeping myself busy at work by learning Python but part of me feels bad because I already feel like the knowledge I got from studying for the CCNA has started to atrophy since I'm not using it on a regular basis. I'm debating if I should start doing some Azure certs when I finish up this Python course. Though I worry it'll just sit there unused like my CCNA, though my manager has said multiple times he wants to start giving me responsibilities handling some of our Azure assets. There's part of me that feels like I should just shut the gently caress up and count my blessings considering I make decent money for entry level IT, my work life balance is pretty solid, I have ample downtime to study/upskill, and the current state of the IT job market isn't so hot. The other part of me thinks I should wait for my one year anniversary to hit soon, and then start applying around if nothing really changes in the meantime.

I look for people like you and advocate for them internally, because people who want to learn and are willing to put in the effort are surprisingly rare. Often there's the opposite problem, where there are people who deserve promotion and not enough spots, but you'd be surprised how often people just want to plug away and never learn anything new.

You now have both real-world networking experience and a CCNA, which is a terrific resume for someone who wants to get into networking. Now, I'm not saying it'll be easy -- there are always lots of people who want to do networking, and it's a tough market right now -- but you have the ideal foundation to make the move. If they can make that happen for you internally, that's great, but it doesn't really sound like the new parent company is interested -- and, frankly, it doesn't really sound like the network team there is appropriately resourced anymore -- and if that's the case, yeah, at about a year you can start applying for your next role.

New Coke
Nov 28, 2009

WILL AMOUNT TO NOTHING IN LIFE.
From the helpdesk side this morning, a fun ticket: '"Can anyone tell me who put <higher up at my company>'s credit card information in his Teams contact info, and who is able to see it?"

Adbot
ADBOT LOVES YOU

LochNessMonster
Feb 3, 2005

I need about three fitty


Heard someone say “I don’t understand linux and don’t want to learn it either. I just want to work with docker and k8s”.

:psyduck:

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply