Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
I spent a few years writing Chef and I wish it had been Ansible. I don't know how Chef is in 2022 but back in 2014-2017 Chef 11 as 12 were absolutely miserable.

Adbot
ADBOT LOVES YOU

trem_two
Oct 22, 2002

it is better if you keep saying I'm fat, as I will continue to score goals
Fun Shoe

Blinkz0rz posted:

I spent a few years writing Chef and I wish it had been Ansible. I don't know how Chef is in 2022 but back in 2014-2017 Chef 11 as 12 were absolutely miserable.

That will never change

12 rats tied together
Sep 7, 2006

New Yorp New Yorp posted:

This certainly sounds like a huge abuse of Ansible to me.
[...]
The process described with Ansible sounds insanely complex with practically no benefit versus just authoring Helm charts or using Kustomize.
What on earth would lead you to believe that "put some yaml files down somewhere" is an abuse of ansible? I thought I provided a pretty decent summary of why ansible is better than kustomize for this but I'm happy to go into more detail. Helm goes without saying.

Blinkz0rz posted:

Nowhere does Ansible remotely look like a solution.
[...]
Say I have service A and service B and they both need a DB. The code for each service is in different repos. I don't want to run 2 DB containers but instead use a different database within the same postgres container. How do I ensure that the logic of "start the DB if it isn't started or use the existing DB container if it is" is reflected via docker compose? I know, the answer is that docker compose can't do this but I don't want to run kubernetes locally just to avoid port collisions and duplicate infra for each service just to run them on a dev box.
What you want to do is resolve an arbitrary set of dependencies (some set of service code) into running infrastructure (in this case, running locally). That is explicitly, and exactly, an ansible thing.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
After having written Puppet, Chef, Ansible, and now Saltstack it's all just a blur of different kinds of pain and my life has been mostly made easier with a better infrastructure testing framework. Ditching all of that test code noise for generating a goss YAML file in a few seconds that ultimately ask "is this running and can I reach X" beats it in terms of raw effectiveness. Because it's fast and portable enough that you can use it as a monitoring task it's a great stop-gap until your team figures out your Real Monitoring solution, too. There's even a Nagios output option, dammit. Goss is easy enough that we had developers submit and maintain them in their PRs, too.

asap-salafi posted:

I'm on a project using ansible for stuff that they could just put directly into their Jenkinsfiles. I don't really get it. My contract ends in a month and I can't wait to leave.
Depending upon what's in the Ansible but after many years of Jenkins library authoring and pipelines I'm of the opinion that Jenkinsfiles that do more than run a shell script in a container and/or launch another Jenkins job are antipatterns that are tough to maintain over time (the less plugins the better really). Using Jenkins as an orchestrator or a cheap UI webform for internal tasks makes little sense when there's better dedicated software for that like Rundeck (which is admittedly not as well supported because this whole devops space is cursed with sunk costs onto bad, old technology until the heatdeath of the universe).

Gyshall posted:

The answer is mocking service calls and responses. We see the "local stack" as an anti pattern because it stops scaling past whatever domains and 10+ micro services/db backends/integrations etc.
After being burned enough times by minio differences from s3 such as differences in consistency model I think it's fine for some pretty basic unit tests but in the end few things beat simply deploying, recovering from and mitigating any errors possible, and rolling back changes as fast as inhumanly possible just because in a complex enough system we won't be able to account for everything in tests. The "basic unit tests" are increasingly meaningless as a codebase matures though I've found so being more clever about what to test becomes more important than factors like whatever framework or level of tests are used.

12 rats tied together
Sep 7, 2006

necrobobsledder posted:

Ditching all of that test code noise for generating a goss YAML file in a few seconds that ultimately ask "is this running and can I reach X" beats it in terms of raw effectiveness.
Agreed with everything but I would not be living up to my thread gimmick if I didn't point out that goss is simply a shell alias for "-check_mode, -diff" but with less features.

necrobobsledder posted:

Depending upon what's in the Ansible but after many years of Jenkins library authoring and pipelines I'm of the opinion that Jenkinsfiles that do more than run a shell script.
Jenkinsfiles are nonstop problems from the ground up but the biggest issue I take with them is that it's not possible to trigger a Jenkinsfile declarative pipeline reload, say because you have new commits to main, without first running the pipeline? This fundamentally breaks like every ops-adjacent git workflow and makes Jenkins into clown software in my mind. Extremely happy to be wrong here if anyone knows where this functionality is hiding.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

12 rats tied together posted:

Jenkinsfiles are nonstop problems from the ground up but the biggest issue I take with them is that it's not possible to trigger a Jenkinsfile declarative pipeline reload, say because you have new commits to main, without first running the pipeline? This fundamentally breaks like every ops-adjacent git workflow and makes Jenkins into clown software in my mind. Extremely happy to be wrong here if anyone knows where this functionality is hiding.
While I'm pretty sure that my suggestion doesn't fix what you're trying to avoid but "Suppress Automatic SCM Triggering" is handy for multi-branch pipelines for this reason at least. I have my doubts that even going into the new hotness Jenkins X fixes this because the problem boils down to Jenkins being unable to determine what stages are executed in a pipeline without first executing the pipeline in the first place, and everything in Jenkins translates back into the internal XML representation of the job object graph like it's a multi-pass compiler but there's no materialization of the pipeline possible without execution because the nodes themselves spawn new stages and steps sometimes making it probably intractable for all but the most trivial of pipelines to pre-process stages.

In my life in the Jenkins saltmine penal colony I don't think there is any way to make Jenkins actually behave according to modern expectations. It takes way too much effort to setup pipelines intuitively without at least 3x more YAML than everything else out there and when using the newer stuff like declarative pipelines and the YAML options the second a feature is a bit more complex than whatever trivial-looking construct a CloudBees engineer ritually sacrificed their eyeballs to produce you have to drop down into the drat Groovy DSL crap all over again.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

12 rats tied together posted:

Jenkinsfiles are nonstop problems from the ground up but the biggest issue I take with them is that it's not possible to trigger a Jenkinsfile declarative pipeline reload, say because you have new commits to main, without first running the pipeline? This fundamentally breaks like every ops-adjacent git workflow and makes Jenkins into clown software in my mind. Extremely happy to be wrong here if anyone knows where this functionality is hiding.
Use the Jenkinsfile for configuration of actual pipelines/steps, don't put job configuration stuff into the Jenkinsfile. Use Job DSL for managing jobs. Groovy is a bit awkward to work with, but you can make it do whatever you want and it's possible to test Job DSL code very well with Gradle. I have a bunch that automatically generate templated jobs for anything in a certain GitLab organization, for example.

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
Also use shared libraries. Jenkins is bad but most DevOps teams I work with end up compounding the problems. My current gig we use Pipelines as YAML and shared libraries and rarely have to touch anything unless we're adding new stages.

12 rats tied together
Sep 7, 2006

Thanks, for reasons beyond my current control we have ~400 stages that should exist and the problem is usually that they do not exist yet, which seems to be exactly the "unless" scenario assuming I have read and understood both of your posts.

We do use shared libraries and that part is reasonably inoffensive. With any luck I will have deleted jenkins within q2 but if that doesn't happen I'll avail myself of the options presented.

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
What kind of stages do you have beyond lint/build/test/deploy? Genuinely curious what other stages folks have out there.

The Fool
Oct 16, 2003


most of our apps run sonarqube as its own stage

otherwise most of our stuff is just variations on those core stages

there's an app I worked with last month that had 2 build stages and 4 test stages and 8 deploy stages

minato
Jun 7, 2004

cutty cain't hang, say 7-up.
Taco Defender
I think lint/build/test/deploy covers the whole gamut, but each of those can contain multitudes.

"build" can mean an optimized production build, and/or one built with more debug info, or for alternative architectures, that might be rolled out at the same time. (When we were testing an alternative compiler, we built it on both our current compiler and the candidate one, and deployed X% of the servers with the candidate one)

"test" can mean smoketests ("can I even run the app"), integration tests, performance regression tests (did the app suddenly start consuming more memory compared to the last build? Or it's really slow? Or the build binary is much larger?), and shadow traffic tests (let's duplicate all incoming production requests, sending half to the deployed app and the other half on a one-way trip to our candidate, and see if the candidate's error-rate / mem-usage / other metrics are wildly different).

"deployment" can mean rolling through a staging server to a production server, or it could be some slow bleed in where traffic is gradually shifted from one version to the next. Or possibly A/B testing (although those are typically done via realtime app config changes, not by the version deployment system).

Methanar
Sep 26, 2013

by the sex ghost
How did you do your regression testing?

We had a project for a while that was meant to do automatic canary analysis of new releases through spinnaker by interrogating time-series data for the app for increased failure rates or whatever. But it went no where because it turns out that's a nightmare project.

minato
Jun 7, 2004

cutty cain't hang, say 7-up.
Taco Defender
Yes, it's a nightmare project:

"Memory usage went up by $threshold%... so was that intended as part of supporting some new feature? Or is it a blip caused the fact we're testing it with realtime data? (There's no point in having a fixed set of data to test against, since it goes stale immediately)."

"Is it slowly creeping up over each release in a "boiling the frog" fashion?"
"What value of $threshold should be chosen to set off the alarms?"

"Wait, did memory go down? Was that due to an optimization, or is it indicative of a problem?"

"If it was intended, who is responsible for the consequences (e.g. more server capacity may be needed)? The dev, or the dev's manager, or someone else?"

"If it was not intended, which commit caused it?" Not easy to answer when the commit rate is far higher than the deployment rate, and it's not just a matter of bisecting since build times are huge and re-testing has a "you can never step in the same [data] stream twice" problem.


None of the answers here are easy to find.

Hadlock
Nov 9, 2004

minato posted:

Yes, it's a nightmare project:

None of the answers here are easy to find.

This is why QA exists, ride them hard for not catching changes in performance, then scapegoat them when prod blows up, then lay them all off when budget cuts happen because they weren't getting results. Also give them no budget and refuse to hire anyone with any infrastructure experience

Why waste engineering cycles on this garbage when a good qa team will do this for you :psyduck:

/sarcasm ...?

Methanar
Sep 26, 2013

by the sex ghost

Hadlock posted:

Why waste engineering cycles on this garbage when a good qa team will do this for you :psyduck:

This but unironically.

Auto canary analysis is a blackhole of bullshit, false alarms and non-generalizable configuration.

minato
Jun 7, 2004

cutty cain't hang, say 7-up.
Taco Defender
Well, the glib answer was that simultaneously there was no QA team and our users were our unwitting QA team. This was for a 2x daily-deployed service with many millions of users (think: Netflix) and we could afford to guinea-pig new builds on small numbers of them and watch error rates to see if any canaries started singing.

QA teams are often thorough but at the cost of speed & agility; they need a spec to work from and time to build up test cases, but when your dev team is 250 engineers going as fast as they can to get features and experiments out the door ASAP, a QA team is just going to rapidly fall behind and become friction. Admittedly this wasn't exactly medical software that they were deploying, it was a product where minor localized user outages were deemed acceptable.

And even if we did have a QA dept, it doesn't solve the fundamental problems of: "$metric changed. Was it a *real* change or a blip? If it was real, was it intentional? Did it have a positive or negative effect (given the context of the change)? Was it negative *enough* for us to act on it? How do we decide what that level is? How we can feasibly find who was responsible? And who is responsible for deciding when to accept it or revert it?"

I was told that the on-call was in charge of answering all these questions, and that the quality of these decisions was highly variable depending on the experience of the on-call. (in retrospect, OF COURSE it was). I was tasked with fixing this by automating it all instead of using a bunch of hand-wavy "vibes" that the on-calls got from the canary dashboards. I failed; I could only make it easier to extract the data, analyzing it was always going to be an art.

Methanar
Sep 26, 2013

by the sex ghost

minato posted:

I failed; I could only make it easier to extract the data, analyzing it was always going to be an art.

Almost verbatim my own org's experience. Except we gave up pretty quick when we realized the value-add was incredibly nebulous. You'd only ever substitute SRE's hand wavey judgement for the hand wavey judgement of thoughtless, contextless automated rules that must be constantly janitored.

Microsoft firing their QA team made a lot more sense to me after sitting through the early drafts of the auto canary analysis project. Sometimes it really is best to just throw new versions at the wall and just see if it works or not. At sufficient scale it's just not possible to simulate load/testing at the levels you need to pre-prod to really be sure of anything.

Rollback or forward if you need to.

Methanar fucked around with this message at 05:06 on Mar 29, 2022

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.
I've got a what should be basic question that I can't figure out.

I've got an EC2 node added as a Jenkins agent on my company's master, and I can confirm that I can access this master on the EC2 node through ports 443 and 50000:
code:
[ec2-user@ip-my-ip ~]$ telnet master-2.jenkins.[mycompany].com 50000
...
Connected to master-2.jenkins.[mycompany].com.
Escape character is '^]'
However, when I try to start Jenkins like so:

code:
java -jar agent.jar -jnlpUrl https://master-2.jenkins.[mycompany].com/computer/[agent-name]/jenkins-agent.jnlp -secret @secret_file -workDir "/home/ec2-user/jenkins"
I get the error:
code:
java.io.IOException: https://master-2.jenkins.[mycompany].com/ provided port:50000 is not reachable
        at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:311)
        at hudson.remoting.Engine.innerRun(Engine.java:724)
        at hudson.remoting.Engine.run(Engine.java:540)
How can it be that the port 50000 is open on master but JNLP still considers it unreachable? This seems like a total contradiction I know.

America Inc. fucked around with this message at 00:23 on Mar 30, 2022

Junkiebev
Jan 18, 2002


Feel the progress.

quarantinethepast posted:

I've got a what should be basic question that I can't figure out.

I've got an EC2 node added as a Jenkins agent on my company's master, and I can confirm that I can access this master on the EC2 node through ports 443 and 50000:
code:
[ec2-user@ip-my-ip ~]$ telnet master-2.jenkins.[mycompany].com 50000
...
Connected to master-2.jenkins.[mycompany].com.
Escape character is '^]'
However, when I try to start Jenkins like so:

code:
java -jar agent.jar -jnlpUrl https://master-2.jenkins.[mycompany].com/computer/[agent-name]/jenkins-agent.jnlp -secret @secret_file -workDir "/home/ec2-user/jenkins"
I get the error:
code:
java.io.IOException: https://master-2.jenkins.[mycompany].com/ provided port:50000 is not reachable
        at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:311)
        at hudson.remoting.Engine.innerRun(Engine.java:724)
        at hudson.remoting.Engine.run(Engine.java:540)
How can it be that the port 50000 is open on master but JNLP still considers it unreachable? This seems like a total contradiction I know.

“It’s always DNS”*

Is that url resolvable on your ec2 instance?

*unless you have a nextGen firewall in between them which will allow connections, but block based on protocol**

**unless it’s in k8s: then it’s rbac

America Inc.
Nov 22, 2013

I plan to live forever, of course, but barring that I'd settle for a couple thousand years. Even 500 would be pretty nice.
I just did an nslookup on the Jenkins master hostname from my EC2 instance and it is resolvable.
code:

$ nslookup
> master-2.jenkins.[my company].com
Server: xxx.xxx.xxx
Address: xxx.xxx.xxx

Non-authoritative answer:
...

I'm starting to think that something's messed up with the secret file.

E: actually no, without the secret there is an immediate 403.

America Inc. fucked around with this message at 05:09 on Mar 30, 2022

teamdest
Jul 1, 2007

quarantinethepast posted:

I've got a what should be basic question that I can't figure out.

I've got an EC2 node added as a Jenkins agent on my company's master, and I can confirm that I can access this master on the EC2 node through ports 443 and 50000:
code:
[ec2-user@ip-my-ip ~]$ telnet master-2.jenkins.[mycompany].com 50000
...
Connected to master-2.jenkins.[mycompany].com.
Escape character is '^]'
However, when I try to start Jenkins like so:

code:
java -jar agent.jar -jnlpUrl https://master-2.jenkins.[mycompany].com/computer/[agent-name]/jenkins-agent.jnlp -secret @secret_file -workDir "/home/ec2-user/jenkins"
I get the error:
code:
java.io.IOException: https://master-2.jenkins.[mycompany].com/ provided port:50000 is not reachable
        at org.jenkinsci.remoting.engine.JnlpAgentEndpointResolver.resolve(JnlpAgentEndpointResolver.java:311)
        at hudson.remoting.Engine.innerRun(Engine.java:724)
        at hudson.remoting.Engine.run(Engine.java:540)
How can it be that the port 50000 is open on master but JNLP still considers it unreachable? This seems like a total contradiction I know.

Long shot, but check that the certificate/chain/root that the master is using is actually in the Java key store on the remote worker? I’ve had UNTOLD problems with Java apps that came down to that.

Saukkis
May 16, 2003

Unless I'm on the inside curve pointing straight at oncoming traffic the high beams stay on and I laugh at your puny protest flashes.
I am Most Important Man. Most Important Man in the World.

teamdest posted:

Long shot, but check that the certificate/chain/root that the master is using is actually in the Java key store on the remote worker? I’ve had UNTOLD problems with Java apps that came down to that.

Yes. you probably want to test https with 'openssl s_client -connect master-2.jenkins.[mycompany].com:50000 -showcerts' .

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb
I commented out one of my containers I don't need to run right now in my docker-compose.yml file and did a {{docker stop airsonic-advanced}} and {{docker rm airsonic-advanced}} but it keeps starting back every time I run docker-compose, but with a different name now (repo-name_airsonic-advanced_1). Why does it keep starting back up? It's not listed in the "Recreating ..." output when I'm running docker-compose.

Love Stole the Day
Nov 4, 2012
Please give me free quality professional advice so I can be a baby about it and insult you

fletcher posted:

I commented out one of my containers I don't need to run right now in my docker-compose.yml file and did a {{docker stop airsonic-advanced}} and {{docker rm airsonic-advanced}} but it keeps starting back every time I run docker-compose, but with a different name now (repo-name_airsonic-advanced_1). Why does it keep starting back up? It's not listed in the "Recreating ..." output when I'm running docker-compose.

Maybe something else in the compose file requires it as a dependency, so it's auto-resolving to whatever your Dockerhub repository is to populate it correctly. :shrug:

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

Love Stole the Day posted:

Maybe something else in the compose file requires it as a dependency, so it's auto-resolving to whatever your Dockerhub repository is to populate it correctly. :shrug:

Good thinking, I've confirmed I don't have any references to it in a depends_on somewhere. Still keeps coming back!

jaegerx
Sep 10, 2012

Maybe this post will get me on your ignore list!


fletcher posted:

Good thinking, I've confirmed I don't have any references to it in a depends_on somewhere. Still keeps coming back!

Grep for it. Something is calling it

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

jaegerx posted:

Grep for it. Something is calling it

Ahhh finally found it, docker-compose was being called with --file ecr-images.yml and the reference to tell it where to get my image for that image was still present in there, so it was still spinning up the container for it. Thank you!

Junkiebev
Jan 18, 2002


Feel the progress.

Anyone using https://buildpacks.io/ for templated-building? Is that a Cool Way To Be?

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat
Heyo guys, I have a minor ansible/json question for the thread, maybe I can get some help.

So I have an automation that current works, but I forsee an issue and I need to enhance it. Basically I hit an API, get back some info, and then I need to run some more commands using some details in the response of that call. The problem is that the response can return more than one list, and I need to iterate over each list. Here's some pseudo ansible code:

code:
- Name: query api
  uri: 
    url: [url]https://api/whatever[/url]  
  register: api_output
the response looks like this

code:
[
  { "something:" "info",
    "somethingslse:" "data"
    "site": "name_of_site"
  }.
  { "something:" "info2",
    "somethingslse:" "data2"
    "site": "name_of_site2"
  }.
]
In the test domain, there is only one response, so I can do this:

code:
- set_fact
	sitename: "{{ api_output.json.result[0].site }}'

- name: follow up action
  uri:
     url: "https://api/query/{{ sitename }}"

and it works fine, but in the live environment, there are more than one response, so I need to revise and make it iterate over all the responses, but I'm having trouble writing it in ansible. I guess I could run the command, count the length of the array and then rerun the loop that many times? But I'm just not sure how to manage it that either

more pseudo code:
code:
- set_fact
	sitename: "{{ api_output.json.result[0].site }}'

- set_fact:
      number_of_sites: "{{ checksite_output.json.result |length }}"

- set_facts:
	site_{{ index }}: api_output.json.result[0].site
  loop:
	until loop_index=number of sites

- name: follow up action
  uri:
     url: "https://api/query/{{ site_{{index}} }}"
  loop:
	until loop_index= number of sites
I'm just struggling iterating, I can probably skip setting the fact for the sitename and just use the length variable directly maybe? Figure I'll throw this out here before I spend an hour googling ansible loops.

Quebec Bagnet
Apr 28, 2009

mess with the honk
you get the bonk
Lipstick Apathy

Jerk McJerkface posted:

Heyo guys, I have a minor ansible/json question for the thread, maybe I can get some help.

So I have an automation that current works, but I forsee an issue and I need to enhance it. Basically I hit an API, get back some info, and then I need to run some more commands using some details in the response of that call. The problem is that the response can return more than one list, and I need to iterate over each list. Here's some pseudo ansible code:

code:
- Name: query api
  uri: 
    url: [url]https://api/whatever[/url]  
  register: api_output
the response looks like this

code:
[
  { "something:" "info",
    "somethingslse:" "data"
    "site": "name_of_site"
  }.
  { "something:" "info2",
    "somethingslse:" "data2"
    "site": "name_of_site2"
  }.
]
In the test domain, there is only one response, so I can do this:

code:
- set_fact
	sitename: "{{ api_output.json.result[0].site }}'

- name: follow up action
  uri:
     url: "https://api/query/{{ sitename }}"

and it works fine, but in the live environment, there are more than one response, so I need to revise and make it iterate over all the responses, but I'm having trouble writing it in ansible. I guess I could run the command, count the length of the array and then rerun the loop that many times? But I'm just not sure how to manage it that either

more pseudo code:
code:
- set_fact
	sitename: "{{ api_output.json.result[0].site }}'

- set_fact:
      number_of_sites: "{{ checksite_output.json.result |length }}"

- set_facts:
	site_{{ index }}: api_output.json.result[0].site
  loop:
	until loop_index=number of sites

- name: follow up action
  uri:
     url: "https://api/query/{{ site_{{index}} }}"
  loop:
	until loop_index= number of sites
I'm just struggling iterating, I can probably skip setting the fact for the sitename and just use the length variable directly maybe? Figure I'll throw this out here before I spend an hour googling ansible loops.

Combining your first and last code blocks (not tested):

YAML code:
- name: query api
  uri: 
    url: https://api/whatever
  register: api_output

- name: follow up action
  uri:
    url: "https://api/query/{{ item.site }}"
  loop: "{{ api_output.json.result }}"
When you use any loop construct to iterate over a collection, Ansible automatically assigns the current element the name item within that context. You can control that name with loop_control which will be mandatory if you need to nest loops (which, by the way, can only be accomplished by using include_tasks). That link has a lot of good examples of how to use loops.

12 rats tied together
Sep 7, 2006

edit: removed this section because Quebec Bagnet did a good job explaining it

Less broadly and more specifically for your use case, I would recommend thinking about this instead:
YAML code:
---
- name: first play to discover and set some hosts
  hosts: localhost
  connection: local
  gather_facts: no
  vars:
    # lets pretend i hit an API and got this data instead of hardcoding it
    api_response: [{
        "something": "info",
        "somethingslse": "data",
        "site": "name_of_site"
      },
      { "something": "info2",
        "somethingslse": "data2",
        "site": "name_of_site2"
      }]

  tasks:
    - name: configure dynamic inventory
      add_host:
        name: "{{ item.site }}"
        groups:
          - someapp
      with_items: "{{ api_response }}"

- name: second play to run tests on each host
  hosts: someapp
  connection: local
  gather_facts: no
  tasks:
    - name: run a curl, or whatever, on each site member
      debug:
        msg: "https://{{ inventory_hostname }}"
You can use add_host plus with_items to configure a temporary inventory for whatever environment you point the playbook at. This would let you use more of ansible's "host level features" and be quite a bit faster in large environments.

tortilla_chip
Jun 13, 2007

k-partite
Loops are garbage in ansible. Write a filter plugin.

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat

tortilla_chip posted:

Loops are garbage in ansible. Write a filter plugin.

thanks for the helpful suggestion!

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat

Quebec Bagnet posted:

good ansible


I'll try these

12 rats tied together
Sep 7, 2006

They're not entirely wrong :shobon: filter plugins are really good and loops are kinda crap, but I think you still have to loop here fundamentally. Filter plugins would be a good choice if you had a really complex data structure, since you can do a lot by chaining "jsonparse" and "set_fact, with_items" etc, but task bloat is a real problem with larger playbooks.

To write a filter plugin,

- add a folder to your ansible repository called "plugins", and then a subfolder called "filter"
- add "filter_plugins = ./plugins/filter" to the ansible.cfg at your repo root
- put a python file in here called "filters.py"
- reference this file for an example of a minimally configured plugin file, create something with roughly the same structure, save it

Once you get this scaffolding done, you can write python functions and add them to your FilterModule's "filters" method. You can use these filters anywhere in ansible: playbooks, group_vars, templates, inventories, whatever. You use them in the same way that you would use "from_yaml" or "to_json", but because they can be heavily tailored towards your exact needs they can save you a lot of task code (which is the worst kind of code).

Running with -vvv (or maybe -vvvv?) will show you debug info about what files ansible is parsing to load filters (to make sure your ansible.cfg is correct), and you'll see a warning usually if you have a file that can't be read by the plugin loader.

Building this scaffolding tends to be high-value because you can use the same pattern to create any type of plugin, and because they're stored in source control, the rest of your team "installs" them automatically. Of particular relevance here is that you could write a custom lookup plugin -- all ansible loops are actually just syntax sugar for invoking lookup plugins -- so you could put the API call to get your site data into a lookup plugin called "companyname_sites", and then your task could be simplified to:

YAML code:
- name: run a curl on each site, or whatever
  debug:
    msg: "https://{{ item }}"
  with_companyname_sites: "{{ environment_name }}"

12 rats tied together
Sep 7, 2006

12 rats tied together posted:

They're not entirely wrong :shobon: filter plugins are really good and loops are kinda crap, but I think you still have to loop here fundamentally.

Wanted to break this up for formatting, ansible can "loop" in three ways --

1- "Task looping" -- this is when you set a loop construct on your task. Task loops run once per module invocation, which means if you were targeting a group of 25 servers and you wanted to "for x in 1..5, ping google", you would ping google 125 times. This can be bad if you're trying to have multiple servers interact with a shared thing like a database or a file, or some external API. Often you see people use run_once to control for this.

2- "Controller looping". Everything in ansible boils down to a plugin or a module, most of these things execute on the control host as python, which of course has loops. Usually these loops run async with your main tasks as part of bookkeeping and state tracking, ansible uses a task queue internally which is why you can dynamically alter your playbook as its running (by adding more tasks, which have their variables recursively resolved, or including code through role dependencies which can be conditional, etc).

3- "Host parallelism". This is where ansible's forks and strategy configs come into play, ansible executes all of your tasks in sequence, but each task executes on every host in parallel. Ansible writes "things that need to happen" to that task queue and then, for each fork you configure, plucks an item off the queue and runs it.


With the yaml snippet I posted above, where ansible would read your API data and construct an ephemeral inventory, you create a bunch of "virtual hosts". The playbook is still using connection: local, so it runs from your controller, it just runs in the context of these things you discovered and ran through add_host (of note, you can also include variables with add_host). Because you're forking a logical thing (each site) into a "host", ansible will finish your job very quickly because it can execute on hosts (even virtual hosts with connection: local) in parallel.

Compare this to if you used a task loop, and your API call came back with 2,000 sites, that is still fundamentally a task (not a host), which means it must execute in sequence, once for each site. There are some techniques you can use to mitigate this at a task level, but they kinda suck, and using this virtual host pattern IME works better most of the time.

Super-NintendoUser
Jan 16, 2004

COWABUNGERDER COMPADRES
Soiled Meat
Those snippets worked perfectly, thanks so much!

Methanar
Sep 26, 2013

by the sex ghost
How do I interview a new manager for my team?

Adbot
ADBOT LOVES YOU

Hadlock
Nov 9, 2004

Ask them what percentage of their time will be spent on IC work, if that number is higher than 0 should be an immediate pass

Ask for seven examples of when they advocated for paying down tech debt and how that turned out, and why

Hiring a build/infra/devops manager is hard as typically they promote from within, and honestly, those kinds of engineers make terrible managers

Good luck

We had to hire to replace our old manager, I ultimately, mostly due to attrition, became the unofficial decision maker and picked the least worst of the three I was given to choose from by a very "bro" recruiter who didn't have any understanding of what we did

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply