Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›2 »

xzzy: Mar 5, 2009

I've had nothing but good experiences with Prometheus, but like 12 rats suggested I only put numbers in to it. I got 2300 servers each exporting 1700 metrics and it's been bulletproof.

My only real critique is long term metrics storage, people around here love having graphs since the dawn of time but Prometheus is very explicit that it's not intended to fill that role so I can't be too fussy about it. At least they provide methods to export to databases that do perform that job.

# ¿ Feb 4, 2020 16:33

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 09:11

xzzy: Mar 5, 2009

We went down the same road with puppet. gently caress yeah, community modules! Never write code again!

Until we realized many modules are more complicated to use than hand writing config files for the software we're using and stop getting development or don't work right or get refactored regularly.

So now we've gone back to storing config files in puppet with limited use of templates when we really need per-server customizing.

# ¿ Apr 19, 2020 20:23

xzzy: Mar 5, 2009

Are projects like maas and tinkerbell useful in more traditional provisioning environments?

Where I'm at we're typically 5+ years behind the cutting edge, so we're not very cloud aware though the momentum is building. Now that everyone is work from home provisioning new hardware has gotten really difficult because our workflow is fairly antique.. get some new hardware, someone spends a day configuring IPMI and customizing a ks.cfg then letting pxeboot do its thing. This hasn't evolved much for almost 20 years even though we've iterated through stuff like cobbler and foreman.

So if there's some better provisioning tools out there that support traditional servers (that is, it's provisioned and performs a single task until the hardware fails), that is pretty interesting to me. There has been talk of moving user analysis jobs into openshift which I'm not sure I'm keen on but it does indicate a more dynamic solution is on the horizon.

# ¿ Jun 10, 2020 15:59

xzzy: Mar 5, 2009

That sounds like Fun(tm) to me. But we're so far away from that kind of mindset it'll probably be 10 years before I get to try it. Everyone here is still in the my server is my server forever type of thinking.

# ¿ Jun 16, 2020 17:17

xzzy: Mar 5, 2009

Find a reputable vendor that does supermicro, specify the requirements, buy a rack of hardware. They're reasonably cheap, work well, and a BMC plus two interfaces are standard. Adding a third nic is easy.

# ¿ Aug 10, 2020 04:05

xzzy: Mar 5, 2009

That's where we were three years ago, and went down the road of running our own cluster (we chose okd). It works, but it is a lot of work.. especially if you start opening it up to users.

I haven't yet regretted putting our internal apps into it, everything's been solid.

# ¿ Oct 17, 2020 03:50

xzzy: Mar 5, 2009

I can't offer any gitlab alternatives but I can say we've been using gitlab for years and I think it's great.

# ¿ Oct 28, 2020 17:48

xzzy: Mar 5, 2009

My solution was make the Mac support group run some Mac mini's for me and the extent of my involvement is telling Jenkins to ssh to them.

# ¿ Nov 10, 2020 00:21

xzzy: Mar 5, 2009

Anyone tried out Loki and have impressions? I've been wanting to, but my department is still content with old school text files and rsyslog so it's hard to build a case to get them to consider something newer.. but if I can set up something super awesome maybe I can drag them into this century.

Assuming the tool is any good that is.

I've been nothing but impressed with Prometheus, the number of metrics it can handle on cheap hardware is pretty amazing. To be fair my install replaced a ganglia setup which had a long history of obliterating hard drives so pretty much anything would seem great. But I haven't been able to break Prometheus yet.

# ¿ Nov 14, 2020 18:48

xzzy: Mar 5, 2009

Blinkz0rz posted:

i haven't done much complex stuff in ansible but couldn't you curl the es health endpoint as a blocking operation until it reports ready and then continue to the next task?

That's how I did it. Well, not with ansible, but that was our approach. We set up a job that checked the cluster was green before running any update/restart.

# ¿ Dec 13, 2020 15:57

xzzy: Mar 5, 2009

Based on the snippet posted it might not be the best choice for this specific case, but the envinject plugin is pretty great in general for managing environment variables in a jenkins job. Users can enter variables in the gui or specify them in a file to be imported so it generally satisfies the gui types and the shell types.

# ¿ Jan 25, 2021 23:52

xzzy: Mar 5, 2009

Hadlock posted:

Docker on windows is pretty much a dead technology, yeah

Isn't docker dying everywhere? Everyone I know is migrating to other runtimes.

Which isn't that many people, I'm not exactly a big mover in the container world, but the people I do work with are pretty down on docker.

# ¿ Mar 12, 2021 16:49

xzzy: Mar 5, 2009

I didn't ask if containers are dying, I asked if docker is dying. :v:

The docker daemon is just one of a handful of ways to run an image.

# ¿ Mar 12, 2021 18:12

xzzy: Mar 5, 2009

An on site kube cluster is a full time job for at least one person. It constantly needs TLC and the learning curve is pretty steep too.

So hopefully your new VP is prepared to pay for that.

# ¿ Mar 27, 2021 13:34

xzzy: Mar 5, 2009

I manage systems with several keys in yaml files. But they're encrypted with a private key that is totally not in a yaml file so it's absolutely okay and this will never backfire.

# ¿ Apr 9, 2021 23:37

xzzy: Mar 5, 2009

If your users don't need fast I/O, NFS is the way to go.

Ceph will turn you grey.

# ¿ Apr 18, 2021 00:38

xzzy: Mar 5, 2009

Methanar posted:

Jenkins?

More like Jank ins

You can sync your watch to their CVE reports.

# ¿ May 12, 2021 00:35

xzzy: Mar 5, 2009

That sort of poo poo is inevitable when the spec doesn't require quoting strings. I hate working with json more than I hate working with yaml, but at least json has that going for it.

(though to be fair I hate both of them for a lot of reasons, there just isn't anything better out there and it's hard to compete with their inertia)

# ¿ May 26, 2021 14:57

xzzy: Mar 5, 2009

12 rats tied together posted:

it's very normal for syntaxes to have reserved characters

Yeah but yaml is insane

https://stackoverflow.com/questions/3790454/how-do-i-break-a-string-in-yaml-over-multiple-lines/21699210#21699210

Multiline strings are an extreme case and not related to OP at all but it is a good example of yaml being on drugs.

# ¿ May 26, 2021 16:06

xzzy: Mar 5, 2009

Methanar posted:

Geez, I'm like 0/3 on interviewing people who claim to have extensive experience with prometheus who can't explain to me what the group_left or group_right statements are, or what they do at all.

I don't claim to be a prometheus expert but I do happen to be the one running our instance of it for my group so know the most of anyone where I'm at. I couldn't tell you what group_left or group_right is either but it sounds an awful lot like a join which means it's evil and should be avoided.

# ¿ Aug 27, 2021 03:06

xzzy: Mar 5, 2009

Hadlock posted:

There's not much to manage unless you want to federate and retain terabytes of data and even most of that is a solved problem.

I keep getting nagged to store metrics "forever." What's your favorite solution for that?

Longest I've been able to retain is one year on a single server. Worked great until an untested software update generated some really high cardinality values and brought everything to a screeching halt.

# ¿ Aug 27, 2021 03:43

xzzy: Mar 5, 2009

Hadlock posted:

2) Ask them what they want the data for, and if it's actually useful or just a warm fuzzy blanket that they're wishing for. If it's for the analytics team, you might be using the wrong tool for the job. Might want to use some sort of collector to feed data at certain periods into their favorite analytics tool. Get management to support you and say something like "if your team wants N frequency forever, it's going to cost this much, if you want 0.0N/N frequency forever it's going to cost you an additional headcount plus $X budget, sign here and we'll send the req to the VP for approval while you wait here in the meeting". Dollars per month seems to be the only way to explain the complexity of devops tasks to outside groups.

It's mostly from momentum. Our previous tool was ganglia and they really liked having 10 years of CPU stats, they claimed it helped figure out how users have used resources over time and it informed new hardware purchases (we're still an extremely on-prem oriented business). So it's very much write-once-read-maybe.

quote:

Barring that, I dunno, setup a "forever" prometheus server that pings everything once an hour, once you get to looking at graphs at the three month resolution, that's about what you're samping anyways. Bill it and the maintenance of it to the department that wants their metrics forever. If that doesn't work, see #1

That's actually clever and easy and I should try that. I keep looking for prometheus remote_write targets that would downsample information and there are certainly tools that do that.. but they're annoying and add too much poo poo to support. Definitely geared for environments much larger than us.

quote:

Our data architect demanded that we keep a years worth of his postgres metrics at 1 minute resolution; between that and the core product metrics we ended up at about 1TB/year, I thought that was a reasonable ask.

That sounds about like us, I had prometheus configured for 400 days of retention holding about 400 metrics at 1 minute resolution for each of 2800 servers. It happily lived on a 4TB partition, up until the aforementioned config issue that doubled the metrics without warning.

# ¿ Aug 27, 2021 15:00

xzzy: Mar 5, 2009

I tried real hard to get up to speed on traefik a couple years ago because I needed a magic proxy to make a bunch of silly web page containers accessible. They promised auto discovery of said backends, but I could never get it to work. It's very possible I am a giant idiot though.

So I went back to a static nginx that I have to configure proxy pass rules for manually.

# ¿ Sep 5, 2021 19:02

xzzy: Mar 5, 2009

Our site security scans cipher types too and yells at us if a weak one is found.. this happens every few years and it requires us to purge the offending entry from any web servers under our umbrella.

10+ years under this regime and we haven't had a single user complain that something broke. So I think you'll probably be okay cleaning things up.. but maybe send out a warning just for a little CYA.

Our current string in apache looks like:

protocols: 'all -TLSv1.1 -TLSv1 -SSLv3'
allowed ciphers: 'ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-SHA384:ECDHE-RSA-AES256-SHA384:ECDHE-ECDSA-AES128-SHA256:ECDHE-RSA-AES128-SHA256'

On the odd chance you find it helpful.

# ¿ Oct 27, 2021 16:45

xzzy: Mar 5, 2009

Hadlock posted:

Only if they really really pushed on this, should this request have gotten to your desk. I would push back on wasting any more time on this task

Edit: your manager sucks at protecting you and your time

This is how security works. No one understands what any of the reports mean, they sit around and let nessus do its thing, wait for it to spit out a scary pdf and roll the turd downhill until someone makes the error go away.

# ¿ Oct 27, 2021 16:53

xzzy: Mar 5, 2009

I don't know about best practice but I've had a stable install for several years with a "master to agent" setup with a home grown script. It basically just calls ssh but we use kerberos so there's some extra steps in there to get the logins to work.

We tried having the agents initiate the connection to jenkins early on and it was unstable.. the jar running on the build nodes would crash after a few days.

# ¿ Jan 1, 2022 01:13

xzzy: Mar 5, 2009

Our legacy single Jenkins instance has gotten too big for people to be managing, so I've been given the job to split into multiple Jenkins instances with each group of developers getting their own Jenkins. This is the trivial part.

The part that's giving me pause however is making the executors available to all our instances and preventing an executor from getting overloaded because the instances have no idea there's other Jenkins' firing off builds. For our linux builds it's easy, it's all gonna be done in k8s but we have a bank of OSX systems that I gotta deal with. The best I've found so far is the "node sharing executor" plugin that looks like it does exactly what I need, but the implementation seems kinda kludgy and making Jenkins do something it's not designed to.

So I'm curious if anyone in here has had to do something like this and knows of a better plan.

# ¿ Jan 25, 2022 21:19

xzzy: Mar 5, 2009

A few months ago my org did a forced update on all managed systems purging oracle java from them because of their idiocy regarding licensing. Now everything is openjdk and feels good man.

Of course we still have a few oracle databases floating around but I can appreciate not wanting to migrate production data to something new.

# ¿ Feb 22, 2022 16:01

xzzy: Mar 5, 2009

I'd like to see Jenkins go for five minutes without announcing a CVE.

# ¿ Feb 22, 2022 16:18

xzzy: Mar 5, 2009

Anyone want to talk me into / talk me out of traefik? It feels like it's become a pretty common solution for routing requests to services, but I'm really suspicious of it requiring admin privileges to set up a CRD for ingress. Doesn't that give it a lot of freedom to screw with routes (and potentially screw things up)?

I've been using nginx with manually configured reverse proxies to this point but am working on a project where that might not be the best idea anymore (too many proxies). But I really like not needing any elevated access too.

# ¿ Feb 23, 2022 22:35

xzzy: Mar 5, 2009

LochNessMonster posted:

What makes you choose Traefik specifically?

Only that I see it suggested a lot, and auto config of reverse proxies sounds quite convenient.

So I'm exploring it as an option but the CRD is a big red flag for me.

# ¿ Feb 24, 2022 16:02

xzzy: Mar 5, 2009

Maybe try the sh() fucntion to get any output the command is producing? Something like this:

code:

    
stage('butts') {
      steps {
        script {
          buf = sh(
            script: '''
              echo butts
            ''',
            returnStdout: true
          ).trim()
          echo "${buf}"
        }
      }
    }

If you need the stderr, do the 2>&1 redirection.

I known absolutely nothing about poetry so can only give general suggestions. :v:

# ¿ Mar 11, 2022 02:25

xzzy: Mar 5, 2009

Jenkins is just a victim of its age and userbase. There are so many ways to do anything and so many developers willing to write new plugins that it can be a giant pain to sift through documentation.

Jenkinsfiles did help a lot IMO. But I'll agree their integration with the container universe has been less than ideal.

# ¿ Mar 12, 2022 04:01

xzzy: Mar 5, 2009

We've paid for gitlab for years, but are moving to the CE release this year because the price is going up and puts it out of our budget. How owned are we gonna be?

# ¿ May 11, 2022 02:30

xzzy: Mar 5, 2009

Puppet is out there, struggling to prove it has a place in the cloud universe.

(IMO it's still the best choice for traditional configuration management)

# ¿ Jul 27, 2022 20:06

xzzy: Mar 5, 2009

Happiness Commando posted:

I've joked to a buddy several times that there's a pizza place near me that I want to work at for 3-6 months just to learn how they make their pizza.

Mom said her first job was at a KFC because her mom wanted to get the recipe for their 7 herbs and spices. I guess they figured it was made from scratch in each restaurant.

It didn't work.

# ¿ Aug 11, 2022 02:32

xzzy: Mar 5, 2009

2023 is gonna be the year of helm and ansible to me. We're a hardcore puppet shop and literally everything ties into it so I've been on cruise control for several years, creating a huge gap in my knowledge.

But puppet is not as good with cloudy stuff which is finally becoming A Thing here so I'm gonna take that opportunity.

# ¿ Dec 20, 2022 20:10

xzzy: Mar 5, 2009

I reverse proxy grafana and run SSO on nginx. As an added bonus the intermediate page lets me set up direct links to useful grafana pages in a coherent way, because leaving users to find anything on the grafana front page is begging for stupid tickets.

# ¿ Jan 31, 2023 15:47

xzzy: Mar 5, 2009

Docjowles posted:

My eyes absolutely glaze over looking at Java code but I guess they did something right. Because despite all us hipsters out here trying to write in Go or Python or TypeScript or Rust or whatever, an ungodly percentage of the world runs on Java.

The only thing it did right was timing. It got momentum by dazzling c levels with the "write once run anywhere" buzzword in an era when the internet was exploding. Once it got embedded into web browsers our fate was sealed.

Sun had great penetration in colleges so it was easy to infect young brains with it too. Legions of kids graduated thinking Java was the square peg for every round hole.

# ¿ Feb 23, 2023 09:36

Adbot: ADBOT LOVES YOU

# ¿ May 15, 2024 09:11

xzzy: Mar 5, 2009

My suspicion is Gitlab's last price increase caused enough people to jump ship to the community version that they gotta recoup that money somewhere.

I'm at a super low budget place and we were okay with paying but they priced us out a ways back. Obviously I got no data on their customers but if there's a measurable number of orgs like ours maybe Gitlab is feeling a pinch.

# ¿ Mar 3, 2023 18:56

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›2 »