|
I spent a few years writing Chef and I wish it had been Ansible. I don't know how Chef is in 2022 but back in 2014-2017 Chef 11 as 12 were absolutely miserable.
|
# ? Mar 23, 2022 16:07 |
|
|
# ? Jun 4, 2024 23:20 |
|
Blinkz0rz posted:I spent a few years writing Chef and I wish it had been Ansible. I don't know how Chef is in 2022 but back in 2014-2017 Chef 11 as 12 were absolutely miserable. That will never change
|
# ? Mar 23, 2022 17:33 |
|
New Yorp New Yorp posted:This certainly sounds like a huge abuse of Ansible to me. Blinkz0rz posted:Nowhere does Ansible remotely look like a solution.
|
# ? Mar 23, 2022 17:43 |
|
After having written Puppet, Chef, Ansible, and now Saltstack it's all just a blur of different kinds of pain and my life has been mostly made easier with a better infrastructure testing framework. Ditching all of that test code noise for generating a goss YAML file in a few seconds that ultimately ask "is this running and can I reach X" beats it in terms of raw effectiveness. Because it's fast and portable enough that you can use it as a monitoring task it's a great stop-gap until your team figures out your Real Monitoring solution, too. There's even a Nagios output option, dammit. Goss is easy enough that we had developers submit and maintain them in their PRs, too.asap-salafi posted:I'm on a project using ansible for stuff that they could just put directly into their Jenkinsfiles. I don't really get it. My contract ends in a month and I can't wait to leave. Gyshall posted:The answer is mocking service calls and responses. We see the "local stack" as an anti pattern because it stops scaling past whatever domains and 10+ micro services/db backends/integrations etc.
|
# ? Mar 23, 2022 17:47 |
|
necrobobsledder posted:Ditching all of that test code noise for generating a goss YAML file in a few seconds that ultimately ask "is this running and can I reach X" beats it in terms of raw effectiveness. necrobobsledder posted:Depending upon what's in the Ansible but after many years of Jenkins library authoring and pipelines I'm of the opinion that Jenkinsfiles that do more than run a shell script.
|
# ? Mar 23, 2022 18:02 |
|
12 rats tied together posted:Jenkinsfiles are nonstop problems from the ground up but the biggest issue I take with them is that it's not possible to trigger a Jenkinsfile declarative pipeline reload, say because you have new commits to main, without first running the pipeline? This fundamentally breaks like every ops-adjacent git workflow and makes Jenkins into clown software in my mind. Extremely happy to be wrong here if anyone knows where this functionality is hiding. In my life in the Jenkins saltmine penal colony I don't think there is any way to make Jenkins actually behave according to modern expectations. It takes way too much effort to setup pipelines intuitively without at least 3x more YAML than everything else out there and when using the newer stuff like declarative pipelines and the YAML options the second a feature is a bit more complex than whatever trivial-looking construct a CloudBees engineer ritually sacrificed their eyeballs to produce you have to drop down into the drat Groovy DSL crap all over again.
|
# ? Mar 24, 2022 02:33 |
|
12 rats tied together posted:Jenkinsfiles are nonstop problems from the ground up but the biggest issue I take with them is that it's not possible to trigger a Jenkinsfile declarative pipeline reload, say because you have new commits to main, without first running the pipeline? This fundamentally breaks like every ops-adjacent git workflow and makes Jenkins into clown software in my mind. Extremely happy to be wrong here if anyone knows where this functionality is hiding.
|
# ? Mar 27, 2022 04:53 |
|
Also use shared libraries. Jenkins is bad but most DevOps teams I work with end up compounding the problems. My current gig we use Pipelines as YAML and shared libraries and rarely have to touch anything unless we're adding new stages.
|
# ? Mar 27, 2022 12:40 |
|
Thanks, for reasons beyond my current control we have ~400 stages that should exist and the problem is usually that they do not exist yet, which seems to be exactly the "unless" scenario assuming I have read and understood both of your posts. We do use shared libraries and that part is reasonably inoffensive. With any luck I will have deleted jenkins within q2 but if that doesn't happen I'll avail myself of the options presented.
|
# ? Mar 28, 2022 02:05 |
|
What kind of stages do you have beyond lint/build/test/deploy? Genuinely curious what other stages folks have out there.
|
# ? Mar 29, 2022 00:04 |
|
most of our apps run sonarqube as its own stage otherwise most of our stuff is just variations on those core stages there's an app I worked with last month that had 2 build stages and 4 test stages and 8 deploy stages
|
# ? Mar 29, 2022 00:20 |
|
I think lint/build/test/deploy covers the whole gamut, but each of those can contain multitudes. "build" can mean an optimized production build, and/or one built with more debug info, or for alternative architectures, that might be rolled out at the same time. (When we were testing an alternative compiler, we built it on both our current compiler and the candidate one, and deployed X% of the servers with the candidate one) "test" can mean smoketests ("can I even run the app"), integration tests, performance regression tests (did the app suddenly start consuming more memory compared to the last build? Or it's really slow? Or the build binary is much larger?), and shadow traffic tests (let's duplicate all incoming production requests, sending half to the deployed app and the other half on a one-way trip to our candidate, and see if the candidate's error-rate / mem-usage / other metrics are wildly different). "deployment" can mean rolling through a staging server to a production server, or it could be some slow bleed in where traffic is gradually shifted from one version to the next. Or possibly A/B testing (although those are typically done via realtime app config changes, not by the version deployment system).
|
# ? Mar 29, 2022 00:27 |
|
How did you do your regression testing? We had a project for a while that was meant to do automatic canary analysis of new releases through spinnaker by interrogating time-series data for the app for increased failure rates or whatever. But it went no where because it turns out that's a nightmare project.
|
# ? Mar 29, 2022 01:09 |
|
Yes, it's a nightmare project: "Memory usage went up by $threshold%... so was that intended as part of supporting some new feature? Or is it a blip caused the fact we're testing it with realtime data? (There's no point in having a fixed set of data to test against, since it goes stale immediately)." "Is it slowly creeping up over each release in a "boiling the frog" fashion?" "What value of $threshold should be chosen to set off the alarms?" "Wait, did memory go down? Was that due to an optimization, or is it indicative of a problem?" "If it was intended, who is responsible for the consequences (e.g. more server capacity may be needed)? The dev, or the dev's manager, or someone else?" "If it was not intended, which commit caused it?" Not easy to answer when the commit rate is far higher than the deployment rate, and it's not just a matter of bisecting since build times are huge and re-testing has a "you can never step in the same [data] stream twice" problem. None of the answers here are easy to find.
|
# ? Mar 29, 2022 01:29 |
|
minato posted:Yes, it's a nightmare project: This is why QA exists, ride them hard for not catching changes in performance, then scapegoat them when prod blows up, then lay them all off when budget cuts happen because they weren't getting results. Also give them no budget and refuse to hire anyone with any infrastructure experience Why waste engineering cycles on this garbage when a good qa team will do this for you /sarcasm ...?
|
# ? Mar 29, 2022 01:40 |
|
Hadlock posted:Why waste engineering cycles on this garbage when a good qa team will do this for you This but unironically. Auto canary analysis is a blackhole of bullshit, false alarms and non-generalizable configuration.
|
# ? Mar 29, 2022 02:20 |
|
Well, the glib answer was that simultaneously there was no QA team and our users were our unwitting QA team. This was for a 2x daily-deployed service with many millions of users (think: Netflix) and we could afford to guinea-pig new builds on small numbers of them and watch error rates to see if any canaries started singing. QA teams are often thorough but at the cost of speed & agility; they need a spec to work from and time to build up test cases, but when your dev team is 250 engineers going as fast as they can to get features and experiments out the door ASAP, a QA team is just going to rapidly fall behind and become friction. Admittedly this wasn't exactly medical software that they were deploying, it was a product where minor localized user outages were deemed acceptable. And even if we did have a QA dept, it doesn't solve the fundamental problems of: "$metric changed. Was it a *real* change or a blip? If it was real, was it intentional? Did it have a positive or negative effect (given the context of the change)? Was it negative *enough* for us to act on it? How do we decide what that level is? How we can feasibly find who was responsible? And who is responsible for deciding when to accept it or revert it?" I was told that the on-call was in charge of answering all these questions, and that the quality of these decisions was highly variable depending on the experience of the on-call. (in retrospect, OF COURSE it was). I was tasked with fixing this by automating it all instead of using a bunch of hand-wavy "vibes" that the on-calls got from the canary dashboards. I failed; I could only make it easier to extract the data, analyzing it was always going to be an art.
|
# ? Mar 29, 2022 04:56 |
|
minato posted:I failed; I could only make it easier to extract the data, analyzing it was always going to be an art. Almost verbatim my own org's experience. Except we gave up pretty quick when we realized the value-add was incredibly nebulous. You'd only ever substitute SRE's hand wavey judgement for the hand wavey judgement of thoughtless, contextless automated rules that must be constantly janitored. Microsoft firing their QA team made a lot more sense to me after sitting through the early drafts of the auto canary analysis project. Sometimes it really is best to just throw new versions at the wall and just see if it works or not. At sufficient scale it's just not possible to simulate load/testing at the levels you need to pre-prod to really be sure of anything. Rollback or forward if you need to. Methanar fucked around with this message at 05:06 on Mar 29, 2022 |
# ? Mar 29, 2022 05:02 |
|
I've got a what should be basic question that I can't figure out. I've got an EC2 node added as a Jenkins agent on my company's master, and I can confirm that I can access this master on the EC2 node through ports 443 and 50000: code:
code:
code:
America Inc. fucked around with this message at 00:23 on Mar 30, 2022 |
# ? Mar 29, 2022 23:56 |
|
quarantinethepast posted:I've got a what should be basic question that I can't figure out. “It’s always DNS”* Is that url resolvable on your ec2 instance? *unless you have a nextGen firewall in between them which will allow connections, but block based on protocol** **unless it’s in k8s: then it’s rbac
|
# ? Mar 30, 2022 02:40 |
|
I just did an nslookup on the Jenkins master hostname from my EC2 instance and it is resolvable.code:
I'm starting to think that something's messed up with the secret file. E: actually no, without the secret there is an immediate 403. America Inc. fucked around with this message at 05:09 on Mar 30, 2022 |
# ? Mar 30, 2022 05:03 |
|
quarantinethepast posted:I've got a what should be basic question that I can't figure out. Long shot, but check that the certificate/chain/root that the master is using is actually in the Java key store on the remote worker? I’ve had UNTOLD problems with Java apps that came down to that.
|
# ? Mar 30, 2022 08:08 |
|
teamdest posted:Long shot, but check that the certificate/chain/root that the master is using is actually in the Java key store on the remote worker? I’ve had UNTOLD problems with Java apps that came down to that. Yes. you probably want to test https with 'openssl s_client -connect master-2.jenkins.[mycompany].com:50000 -showcerts' .
|
# ? Mar 31, 2022 09:42 |
I commented out one of my containers I don't need to run right now in my docker-compose.yml file and did a {{docker stop airsonic-advanced}} and {{docker rm airsonic-advanced}} but it keeps starting back every time I run docker-compose, but with a different name now (repo-name_airsonic-advanced_1). Why does it keep starting back up? It's not listed in the "Recreating ..." output when I'm running docker-compose.
|
|
# ? Apr 2, 2022 23:56 |
|
fletcher posted:I commented out one of my containers I don't need to run right now in my docker-compose.yml file and did a {{docker stop airsonic-advanced}} and {{docker rm airsonic-advanced}} but it keeps starting back every time I run docker-compose, but with a different name now (repo-name_airsonic-advanced_1). Why does it keep starting back up? It's not listed in the "Recreating ..." output when I'm running docker-compose. Maybe something else in the compose file requires it as a dependency, so it's auto-resolving to whatever your Dockerhub repository is to populate it correctly.
|
# ? Apr 3, 2022 01:24 |
Love Stole the Day posted:Maybe something else in the compose file requires it as a dependency, so it's auto-resolving to whatever your Dockerhub repository is to populate it correctly. Good thinking, I've confirmed I don't have any references to it in a depends_on somewhere. Still keeps coming back!
|
|
# ? Apr 3, 2022 02:19 |
|
fletcher posted:Good thinking, I've confirmed I don't have any references to it in a depends_on somewhere. Still keeps coming back! Grep for it. Something is calling it
|
# ? Apr 4, 2022 03:54 |
jaegerx posted:Grep for it. Something is calling it Ahhh finally found it, docker-compose was being called with --file ecr-images.yml and the reference to tell it where to get my image for that image was still present in there, so it was still spinning up the container for it. Thank you!
|
|
# ? Apr 4, 2022 19:20 |
|
Anyone using https://buildpacks.io/ for templated-building? Is that a Cool Way To Be?
|
# ? Apr 5, 2022 18:52 |
|
Heyo guys, I have a minor ansible/json question for the thread, maybe I can get some help. So I have an automation that current works, but I forsee an issue and I need to enhance it. Basically I hit an API, get back some info, and then I need to run some more commands using some details in the response of that call. The problem is that the response can return more than one list, and I need to iterate over each list. Here's some pseudo ansible code: code:
code:
code:
more pseudo code: code:
|
# ? Apr 6, 2022 14:00 |
|
Jerk McJerkface posted:Heyo guys, I have a minor ansible/json question for the thread, maybe I can get some help. Combining your first and last code blocks (not tested): YAML code:
|
# ? Apr 6, 2022 17:17 |
|
edit: removed this section because Quebec Bagnet did a good job explaining it Less broadly and more specifically for your use case, I would recommend thinking about this instead: YAML code:
|
# ? Apr 6, 2022 17:39 |
|
Loops are garbage in ansible. Write a filter plugin.
|
# ? Apr 6, 2022 18:57 |
|
tortilla_chip posted:Loops are garbage in ansible. Write a filter plugin. thanks for the helpful suggestion!
|
# ? Apr 6, 2022 19:36 |
|
Quebec Bagnet posted:good ansible 12 rats tied together posted:
I'll try these
|
# ? Apr 6, 2022 19:38 |
|
They're not entirely wrong filter plugins are really good and loops are kinda crap, but I think you still have to loop here fundamentally. Filter plugins would be a good choice if you had a really complex data structure, since you can do a lot by chaining "jsonparse" and "set_fact, with_items" etc, but task bloat is a real problem with larger playbooks. To write a filter plugin, - add a folder to your ansible repository called "plugins", and then a subfolder called "filter" - add "filter_plugins = ./plugins/filter" to the ansible.cfg at your repo root - put a python file in here called "filters.py" - reference this file for an example of a minimally configured plugin file, create something with roughly the same structure, save it Once you get this scaffolding done, you can write python functions and add them to your FilterModule's "filters" method. You can use these filters anywhere in ansible: playbooks, group_vars, templates, inventories, whatever. You use them in the same way that you would use "from_yaml" or "to_json", but because they can be heavily tailored towards your exact needs they can save you a lot of task code (which is the worst kind of code). Running with -vvv (or maybe -vvvv?) will show you debug info about what files ansible is parsing to load filters (to make sure your ansible.cfg is correct), and you'll see a warning usually if you have a file that can't be read by the plugin loader. Building this scaffolding tends to be high-value because you can use the same pattern to create any type of plugin, and because they're stored in source control, the rest of your team "installs" them automatically. Of particular relevance here is that you could write a custom lookup plugin -- all ansible loops are actually just syntax sugar for invoking lookup plugins -- so you could put the API call to get your site data into a lookup plugin called "companyname_sites", and then your task could be simplified to: YAML code:
|
# ? Apr 6, 2022 19:56 |
|
12 rats tied together posted:They're not entirely wrong filter plugins are really good and loops are kinda crap, but I think you still have to loop here fundamentally. Wanted to break this up for formatting, ansible can "loop" in three ways -- 1- "Task looping" -- this is when you set a loop construct on your task. Task loops run once per module invocation, which means if you were targeting a group of 25 servers and you wanted to "for x in 1..5, ping google", you would ping google 125 times. This can be bad if you're trying to have multiple servers interact with a shared thing like a database or a file, or some external API. Often you see people use run_once to control for this. 2- "Controller looping". Everything in ansible boils down to a plugin or a module, most of these things execute on the control host as python, which of course has loops. Usually these loops run async with your main tasks as part of bookkeeping and state tracking, ansible uses a task queue internally which is why you can dynamically alter your playbook as its running (by adding more tasks, which have their variables recursively resolved, or including code through role dependencies which can be conditional, etc). 3- "Host parallelism". This is where ansible's forks and strategy configs come into play, ansible executes all of your tasks in sequence, but each task executes on every host in parallel. Ansible writes "things that need to happen" to that task queue and then, for each fork you configure, plucks an item off the queue and runs it. With the yaml snippet I posted above, where ansible would read your API data and construct an ephemeral inventory, you create a bunch of "virtual hosts". The playbook is still using connection: local, so it runs from your controller, it just runs in the context of these things you discovered and ran through add_host (of note, you can also include variables with add_host). Because you're forking a logical thing (each site) into a "host", ansible will finish your job very quickly because it can execute on hosts (even virtual hosts with connection: local) in parallel. Compare this to if you used a task loop, and your API call came back with 2,000 sites, that is still fundamentally a task (not a host), which means it must execute in sequence, once for each site. There are some techniques you can use to mitigate this at a task level, but they kinda suck, and using this virtual host pattern IME works better most of the time.
|
# ? Apr 6, 2022 20:12 |
|
Those snippets worked perfectly, thanks so much!
|
# ? Apr 6, 2022 22:07 |
|
How do I interview a new manager for my team?
|
# ? Apr 12, 2022 19:26 |
|
|
# ? Jun 4, 2024 23:20 |
|
Ask them what percentage of their time will be spent on IC work, if that number is higher than 0 should be an immediate pass Ask for seven examples of when they advocated for paying down tech debt and how that turned out, and why Hiring a build/infra/devops manager is hard as typically they promote from within, and honestly, those kinds of engineers make terrible managers Good luck We had to hire to replace our old manager, I ultimately, mostly due to attrition, became the unofficial decision maker and picked the least worst of the three I was given to choose from by a very "bro" recruiter who didn't have any understanding of what we did
|
# ? Apr 13, 2022 03:57 |