Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

whats for dinner posted:

My employer has decided to go all in on SAFe and I have spent 6 days in the last 3 weeks in interminable training. With 3 days of PI planning happening next week that is all coming from an executive-groomed backlog. Furiously applying for jobs elsewhere.

I went and skimmed the SAFe website and let's just say the unabomber manifesto has never sounded so attractive.

# ? Oct 2, 2020 10:32

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 22:10

LochNessMonster: Feb 3, 2005; I need about three fitty

Did the training myself a while back because my employer paid for it and lots of customers are "using/implementing" it as their way of working. It's complete garbage and combines all the buzzwords into something that gives management control over teams backlog which means it's the exact opposite from Agile. Commonly reffered to as Stupid Agile For Enterprises. PI events are complete garbage cram sessions which try to put people on the spot for "giving commitment to the PI goals" which can be held over their heads later to force them into delivering garbage.

The exam is webbased and not proctored. The exam questions and anwsers are literally 1 google search away. Bad companies will like it on your resume. If you're into consulting it'll be a good HR check.

# ? Oct 2, 2020 10:44

Collateral Damage: Jun 13, 2009

NihilCredo posted:

I went and skimmed the SAFe website and let's just say the unabomber manifesto has never sounded so attractive.

I did too and Weird Al's "Mission Statement" started playing in my head.

# ? Oct 2, 2020 10:45

vanity slug: Jul 20, 2010

We've been doing SAFe for the past year. To me, the experience is 50 people being stuffed into a small meeting room without proper chairs or any ventilation for two days and being told to lose all autonomy.

# ? Oct 2, 2020 10:51

Warbird: May 23, 2012; America's Favorite Dumbass

The theory is nice and makes sense, but execution is often lacking and I�ve seen punches nearly get thrown multiple times at PIs. It all really comes down to how good the refreshments are. Thankfully moving to consulting means that they just care that I know what it is and they don�t want to pay for me to sit in a room and argue for week once a quarter.

# ? Oct 2, 2020 14:12

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

Jeoh posted:

We've been doing SAFe for the past year. To me, the experience is 50 people being stuffed into a small meeting room without proper chairs or any ventilation for two days and being told to lose all autonomy.

From the training and what my boss has been telling me is going to happen this is basically exactly what I've been expecting.

Warbird posted:

The theory is nice and makes sense, but execution is often lacking and I've seen punches nearly get thrown multiple times at PIs. It all really comes down to how good the refreshments are. Thankfully moving to consulting means that they just care that I know what it is and they don't want to pay for me to sit in a room and argue for week once a quarter.

Unfortunately our PI planning is going to be done entirely across zoom meetings which is going to be an enormous clusterfuck and they've stretched it out to 3 days "to account for that." Most of the good stuff that I saw in the training is stuff that you could really do without having to buy into the rest of SAFe, I think. Stuff like the agile release train and scrum of scrums is just going to make the dysfunction at my company worse than it already is: standups are already status updates where people wait to talk about their impediments, and the weekly leadership meetings are where most managers are waiting to bring up their team's impediments. SAFe practically formalises that and it's gonna be a nightmare to get anything into or out of the dev teams, now. We already suck at communicating as it is!

# ? Oct 3, 2020 04:28

Warbird: May 23, 2012; America's Favorite Dumbass

Ha, ours were 4 days minimum, but that was a few hundred people crammed into a ballroom. I couldn�t imagine the unholy clusterfuck of doing it remotely.

# ? Oct 3, 2020 04:42

Gangsta Lean: Dec 3, 2001; Calm, relaxed...what could be more fulfilling?

SAFe loving owns. Previous job flew or bought Amtrak for engineers, engineering managers, product managers, executives, *scrum masters*, etc from remote offices all across the country, even Hawaii, to a city in the northeastern US for 2 fun-filled days of sitting in a room together discussing tickets. Huge breakfast, huge lunch like 3 hours later, then after the afternoon session was over they leave you to find your own way back to a hotel in the middle of an isolated office park with nothing within walking distance except more office buildings. All the non-US-based contractors (there were a ton, mostly in eastern Europe) were not invited, they dialed in via Zoom just like they did every work day.

Executive management floated around and sat in random sessions for a few minutes throughout they day, interjecting opinions that everyone promptly ignored when they left. At the end of the second day everyone gathered to hear each team commit their soul to some unrealistic goal, lots of head nodding all around, but failing to meet a commitment never resulted in any consequences that I ever saw.

The best part about SAFe is, when management tells you to do something counter to SAFe (it happens a lot), the company buy-in is so hard that all you have to do is point out how contradictory what they�re asking is to the SAFe philosophy. It�s an excuse for following the process instead of doing the work that actually needs to be done. The product is now the process, not the software you�re actually delivering to customers.

# ? Oct 4, 2020 02:50

Mr Shiny Pants: Nov 12, 2012

Gangsta Lean posted:

SAFe loving owns. Previous job flew or bought Amtrak for engineers, engineering managers, product managers, executives, *scrum masters*, etc from remote offices all across the country, even Hawaii, to a city in the northeastern US for 2 fun-filled days of sitting in a room together discussing tickets. Huge breakfast, huge lunch like 3 hours later, then after the afternoon session was over they leave you to find your own way back to a hotel in the middle of an isolated office park with nothing within walking distance except more office buildings. All the non-US-based contractors (there were a ton, mostly in eastern Europe) were not invited, they dialed in via Zoom just like they did every work day.

Executive management floated around and sat in random sessions for a few minutes throughout they day, interjecting opinions that everyone promptly ignored when they left. At the end of the second day everyone gathered to hear each team commit their soul to some unrealistic goal, lots of head nodding all around, but failing to meet a commitment never resulted in any consequences that I ever saw.

The best part about SAFe is, when management tells you to do something counter to SAFe (it happens a lot), the company buy-in is so hard that all you have to do is point out how contradictory what they�re asking is to the SAFe philosophy. It�s an excuse for following the process instead of doing the work that actually needs to be done. The product is now the process, not the software you�re actually delivering to customers.

This, the rituals become the product instead of actually doing work. This has been bugging me a lot, talking about stuff but not actually doing anything.

# ? Oct 7, 2020 07:29

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

Mr Shiny Pants posted:

This, the rituals become the product instead of actually doing work. This has been bugging me a lot, talking about stuff but not actually doing anything.

Yeah, this is almost definitely what's going to happen. We had the first day of PI planning today and we've already had some major security stuff get pushed off in favour of slapping together investor demos :yotj:

# ? Oct 7, 2020 11:56

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Demo-driven development makes complete sense if your organization is more worried about staying afloat than anything else. But I'd also argue if you have more than 5 people as a company and you're still spending an incredible amount of engineering resources chasing demos and investors your company is probably on its way out or you need new leadership that can find better pastures for your company's vision.

# ? Oct 8, 2020 01:46

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

We did 3 days of PI Planning (our first time) last week over Zoom. We're infrastructure, so the overall "product" is poorly defined. One of the biggest actual theoretical advantages of safe is coordinating dependencies between teams, but nearly all our features are independent. I don't think sprints really work for infrastructure, when there's generally more external dependencies that can't be managed, and a lot of the work (even the planned work) is much more reactive and dependent on getting feedback from customers. We've also all been scrambled up out of our traditional domain based teams (linux, windows server, database) into a bunch of generic jack of all trades teams with a little bit of everything.

Love Safe!

# ? Oct 8, 2020 04:16

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

FISHMANPET posted:

We did 3 days of PI Planning (our first time) last week over Zoom. We're infrastructure, so the overall "product" is poorly defined. One of the biggest actual theoretical advantages of safe is coordinating dependencies between teams, but nearly all our features are independent. I don't think sprints really work for infrastructure, when there's generally more external dependencies that can't be managed, and a lot of the work (even the planned work) is much more reactive and dependent on getting feedback from customers. We've also all been scrambled up out of our traditional domain based teams (linux, windows server, database) into a bunch of generic jack of all trades teams with a little bit of everything.

Love Safe!

Yeah, we're a generic "DevOps" team which basically means L3 tech support, infrastructure and operations for a mish-mash of Java development teams and Node.js/React teams. So we're partway through day 2 and our board is totally loaded up with dependency cards. But I've got some interviews next week, at least!

# ? Oct 8, 2020 05:09

LochNessMonster: Feb 3, 2005; I need about three fitty

FISHMANPET posted:

We did 3 days of PI Planning (our first time) last week over Zoom. We're infrastructure, so the overall "product" is poorly defined. One of the biggest actual theoretical advantages of safe is coordinating dependencies between teams, but nearly all our features are independent. I don't think sprints really work for infrastructure, when there's generally more external dependencies that can't be managed, and a lot of the work (even the planned work) is much more reactive and dependent on getting feedback from customers. We've also all been scrambled up out of our traditional domain based teams (linux, windows server, database) into a bunch of generic jack of all trades teams with a little bit of everything.

Love Safe!

I find Kanban a more practical approach for Infra/Platform related teams. Especially in teams where there's lots of fires to be put out, you don't have to modify the sprint goal by removing user stories to make room / adjust for bug fixes. And just like you said, there are so many dependencies, fixed time windows and troubleshooting to be done it's hard to properly estimate time spent, let alone say you're going to fix it in a specific sprint.

# ? Oct 8, 2020 08:02

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

One of our features is to migrate systems into a central management tool, and one of the user stories is to make any permissions adjustments if system owners want to see some data about their systems in the tool that they can't currently. I have no idea how much work it will take, or when I'll even be able to do it, because we're dependent on system owners actually migrating, and then it depends what their wants are and how hard they are to implement!

# ? Oct 8, 2020 14:16

SurgicalOntologist: Jun 17, 2004

I've got questions... we don't have anyone with a DevOps background so I've been forced to learn. Things have been going well for a while with CI testing, CD to a development cluster, temporary deployments for feature branches, etc. Until now we've done everything in GitLab shared runners using docker-in-docker and pushing to GitLab Container Registry, but we are hitting the space limit in some images so I started looking into standing up our own job runner on a GKE cluster.

First question, is there a way to take advantage of caching in dind without pulling whatever image might have the cache? Currently before building we pull the most likely image that could be cached (the one tagged with the branch name, and if it doesn't exist -- i.e., the first run of a new branch -- the image from the target branch). This means that we effectively need twice as much space as the size of the image, in case the whole cache is a miss. Is there a way to check if it will be used and only pull accordingly?

Second question, our builds are very slow on GKE, even the docker pull step. (still using dind, and yes overlay2). Is there a way that images can be cached on the node rather than each job having to pull from the registry first? Or maybe now that we're running jobs on GKE we should switch to Google Container Registry; I assume it would be faster.

Third question. Our pipeline for each service on feature branches has been build > test > push. I thought this makes sense because it's nice to have no broken images in the container registry. The tests are run with docker run, which makes sense to run the tests in the context of the image rather than in the host. Now after looking into the weirdness of docker inside k8s, I've come across kaniko. Seems great, but we can't docker run before pushing. And in any case, it seems build > test > push is not a typical workflow. Should we run tests in the host before building? Or have another job that pulls the image then runs the tests? In that case we can't avoid dind. Or, what might make sense is rather than test jobs running inside the gitlab-runner image, they run in their own image; in other words, let k8s manage the container. Maybe this is more of a Gitlab question, but it doesn't seem possible to set it up this way.

Final question. To anticipate "why are your images big?" the reason is because of machine learning models (big binary files) that get built into the image. Would it be better to pull the models at runtime or something?

Sorry for the mess of a post, appreciate any help and/or informative ridicule.

# ? Oct 8, 2020 14:50

Methanar: Sep 26, 2013; by the sex ghost

How often do the ml models change. Would it be better to not have the models be in your image artifacts but instead maybe pulled out of s3 through a different mechanism. Maybe rsync as part of an init container if not present in a local volume you've mount into the container.

# ? Oct 8, 2020 16:24

SurgicalOntologist: Jun 17, 2004

Methanar posted:

How often do the ml models change. Would it be better to not have the models be in your image artifacts but instead maybe pulled out of s3 through a different mechanism. Maybe rsync as part of an init container if not present in a local volume you've mount into the container.

They change fairly often. We use dvc which uses basically uses a git-tracked file to determine what files to download with dvc pull. So, the repo state/commit hash fully determines what models to use.

Init containers could make sense. It's just a question of running dvc pull before building or after deploying. We'd save 750MB (of 4GB) and be under the limit and we could stick with the Gitlab shared runners.

# ? Oct 8, 2020 19:19

Beamed: Nov 26, 2010; Then you have a responsibility that no man has ever faced. You have your fear which could become reality, and you have Godzilla, which is reality.

JHVH-1 posted:

With ECS at least you can now point environment variables to a secrets manager, and you just have to update the secret there and most likely refresh the service afterwards. It avoids having to continually update the task definition at least.
I don�t know if you change it on the fly if something like nodejs or whatever polling the environment again would pick it up on a running container. (Now I�m curious to test that sometime)

Otherwise how long that takes probably depends on how long your app spins up and how long before your health checks go healthy. New container needs to spin up, take traffic, old container has connects drain and then removed before everything is complete and stable again.

That only works if you aren't adding new ones, but rather changing existing ones

# ? Oct 9, 2020 03:50

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Any of you goons working with Apache Kafka via Confluent Cloud? We're looking for the best way to repeatably deploy Kafka along side our GKE clusters. Confluent's terminology is all over the place, and it seems like they have multiple products, so I'm not sure of the best/most effective way to deploy this stuff.

# ? Oct 9, 2020 15:59

Hughlander: May 11, 2005

Gyshall posted:

Any of you goons working with Apache Kafka via Confluent Cloud? We're looking for the best way to repeatably deploy Kafka along side our GKE clusters. Confluent's terminology is all over the place, and it seems like they have multiple products, so I'm not sure of the best/most effective way to deploy this stuff.

Yes but not what I own. if there's specific questions I can ask the right people.

# ? Oct 9, 2020 19:13

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

What tool are they using to provision? k8s operator? Ansible/et el? Terraform? CLI tools? And does it suck?

# ? Oct 9, 2020 19:35

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Double posting, I'm also curious how you goons are handling multi cluster GKE clusters, if at all. Right now we're creating a VPC along side each group of clusters, but I was also looking at sharing a single VPC in a host project and then creating a service project for each group of clusters. Seems like that has its own downsides and we need to be able to NAT and use VPC endpoints.

Love GCP so far as a lifelong AWS goon, and I'm realizing not all the AWS concepts translate exactly the same.

# ? Oct 12, 2020 12:26

Hadlock: Nov 9, 2004

We did a project per environment, which generally had one cluster each. I like splitting out resources per project whenever possible as with halfway sane user permissions, it drastically reduces the blast radius of any user, script or program

Projects are cheap/free, use them any time you have a service seam when you can. We used terraform and a bunch of custom modules to achieve this

# ? Oct 12, 2020 13:47

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Yeah AWS has way better permission controls on the individual object end, whereas googles are mostly project level.

The inverse of this is that fine level IAM controls are way less developed in GCP, and you should use projects more often then not. AWS cross account federation is a nightmare and I�d really advise you to gut check yourself before you have a bunch of aws accounts.

Edit: re GKE, it was really designed from the ground up for the edges of GKE clusters to be stuff on the internet, so there is a ton of design choices they made initially towards that end. They really wanted folks to go down the beyondcorp model.

Customers coming from traditional DCs or AWS freaked out and pushed them towards doing private networking but that just means they bolted on some stuff over time and it can be disjointed. I would not tie myself up in knots trying to make it look like AWS with your pretty networking diagrams. But if I am honest every place I�ve ever been we had a networking team do just that to me.

freeasinbeer fucked around with this message at 14:29 on Oct 12, 2020

# ? Oct 12, 2020 14:22

Space Duck: Oct 15, 2003

Has anyone used Codefresh? My org is running a POC that we�ve got on hold because of vague horror stories and I�m looking for something a bit more concrete in terms of experience or opinions.

# ? Oct 12, 2020 21:11

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Thanks for the insight on GKE goons. We ended up going to project per deploy route which has been working great. :v:

# ? Oct 13, 2020 03:06

Methanar: Sep 26, 2013; by the sex ghost

Is anybody using non-nginx ingress controllers? Or have had performance problems out of the nginx one?

I've got some really awful performance out of nginx running on some dedicated 8 core m5.8xls. nginx just falls over if there is more than like 2500 rps hitting it, only around 2000 conntrack entries.

I'm using the community nginx controller with basically out of the box configuration. Only real change is an increase to max-body-size. I do have some apps with super long lived connections and I am doing TLS termination, but it still shouldn't be this awful.

Its so bad I'm going to start testing using haproxy's ingress controller, but I don't really understand how this could perform so badly.

# ? Oct 15, 2020 21:19

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

we're using the alb ingress controller and it's quite needs suiting

might not apply for you tho

# ? Oct 16, 2020 00:25

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

My experience has been that long-lived connections really murder nginx's performance. If you're using the default configuration you might want to see if you're being limited by the number of worker processes and worker connections you have configured, seeing as I think the default is intended for a 1-core machine that might be running some other stuff on the same box. We have an autoscaling group of nginx boxes that experience some really significant throughput and the default settings meant that they were oversaturated very very easily. We tuned the config pretty much according to this guide and haven't had any issues since: https://www.digitalocean.com/community/tutorials/how-to-optimize-nginx-configuration. Hopefully the same advice holds true for the ingress controller. In terms of actually having an ingress controller, we use Traefik and I really don't care for it; we're starting to experiment with ALBs

# ? Oct 16, 2020 11:51

12 rats tied together: Sep 7, 2006

I have had nothing but good experiences with alb/elb both as high performance load balancers in general and also the various integrations with 3rd party schedulers like k8s.

They're fantastic products, really hard to imagine not using them if your cluster is anywhere near AWS. I would even use them in a physical cluster as long as I had a reasonable connection to the site.

# ? Oct 16, 2020 17:21

JehovahsWetness: Dec 9, 2005; bang that shit retarded

12 rats tied together posted:

I have had nothing but good experiences with alb/elb both as high performance load balancers in general and also the various integrations with 3rd party schedulers like k8s.

They're fantastic products, really hard to imagine not using them if your cluster is anywhere near AWS. I would even use them in a physical cluster as long as I had a reasonable connection to the site.

We use the alb ingress controller, too. Along w/ external-dns and the alb ingress controllers magic ACM cert matching it's pretty smooth.

# ? Oct 16, 2020 18:57

Methanar: Sep 26, 2013; by the sex ghost

Lots of good words for ALBs here.

Are you guys all just creating a dedicated ALB for each ingress object? Looks like having multiple ingresses behind one ALB is a new 2.0.0 feature that isn't GA yet.

https://github.com/kubernetes-sigs/aws-alb-ingress-controller

# ? Oct 16, 2020 19:02

Hadlock: Nov 9, 2004

Using an ALB or ELB is the new "nobody ever got fired for buying IBM", it just works. I don't think I've ever spent more than 5 minutes poking around on the settings, other to confirm it can/can't do UDP etc in the documentation.

# ? Oct 16, 2020 19:08

JehovahsWetness: Dec 9, 2005; bang that shit retarded

ALBs are cheap enough for us that we just do an ALB-per-ingress to save the headache. I worked at a place where we setup multiple services behind a single ALB and it was a pain once it got big and there were more rules than just a bunch of straight-HOST matching.

# ? Oct 16, 2020 19:10

12 rats tied together: Sep 7, 2006

Since it is a fully abstracted software service there is no reason to bundle multiple ingress in a single ALB. It's not gonna be cheaper, it won't scale better, the only thing it's going to do really is create a time bomb for maximum amount of path pattern rules in the alb api which will probably fail in a tragically opaque manner inside your k8s cluster.

# ? Oct 16, 2020 20:21

Methanar: Sep 26, 2013; by the sex ghost

12 rats tied together posted:

Since it is a fully abstracted software service there is no reason to bundle multiple ingress in a single ALB. It's not gonna be cheaper, it won't scale better, the only thing it's going to do really is create a time bomb for maximum amount of path pattern rules in the alb api which will probably fail in a tragically opaque manner inside your k8s cluster.

I actually figured it would be cheaper.

$0.0225 per Application Load Balancer-hour (or partial hour)

is a base cost per alb for about $16 a month per alb.*100 ingress objects = 1600 a month list.

I guess that's still dirt cheap compared to the stupid 1050 list a month per m5.8xl list I'm paying, of which I need several because nginx is so bad.

# ? Oct 16, 2020 20:58

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

ALB also charges based upon traffic and rule resolutions so you can get 4x+ charges racked up on top of cross AZ bandwidth, so that�s another consideration. I�m just used to comically overprovisioned systems with 30+ micro services sitting at .02% utilization each with an ALB and all of this repeated with a single RDS also at 1% utilization and multiplied out to 6+ environments. Developer time is more costly than computers but when your AWS costs are north of $100k / mo and your company is about to go under and your AWS costs are actually within the same order of magnitude of costs as your engineers maybe a cost-effective architecture matters more.

Do what makes sense for your setup and needs, but make sure you have a sanity check from time to time before you end up like some of my old customers / employers unable to throw bodies or dollars at a 100% technical problem.

# ? Oct 16, 2020 21:33

Trapick: Apr 17, 2006

Space Duck posted:

Has anyone used Codefresh? My org is running a POC that we’ve got on hold because of vague horror stories and I’m looking for something a bit more concrete in terms of experience or opinions.

We've been transitioning to it for a while (theoretically) and I'm not a fan - though I can't say if that's because of inherent flaws or that the dude on our end is bad at it (I haven't done any of the pipeline setup so far, just troubleshot downstream issues). As a small example there doesn't seem to be any good way in the gui to search for a particular build? Like we have a dozen microservices all using the same pipeline, so if a few are running at the same time I have to just click through each one to find out which service it's building.

# ? Oct 17, 2020 02:44

Adbot: ADBOT LOVES YOU

# ? May 15, 2024 22:10

Hed: Mar 31, 2004; Fun Shoe

Would anyone go down the path of kubernetes for an application hosted on their own on-prem hardware?

Right now I have a simple docker-compose that does what I need, but absolutely need load balancing at each local site for production. I can load balance at a TCP or HTTP level so something like docker swarm looks interesting but might be a dead-end developmentally. Kubernetes looks pretty involved but if it's worth it I'm willing. Also Hashicorp Nomad looks pretty interesting. I'm curious what would lead you to one decision point or another on something that's never going to be in a commercial cloud. Happy to go into my use cases more in whatever dimension would be helpful.

# ? Oct 17, 2020 03:23

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »