Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

Methanar: Sep 26, 2013; by the sex ghost

MightyBigMinus posted:

if you have a diverse/mixed workload on the hardware (say dozens+ of containers) then no you'll probably never notice and the defaults will probably be fine.

if you run like one big ml pipeline and just happen to use kube to put the jobs on the hardware then yea you'll have to dig in.

There is still the potential that a given workload gets put on a cpu core that isn't particularly well-aligned with the physical ram associated with the process, right?
I was under the impression that the linux kernel wouldn't always be able to inherently know what memory is good for a given cpu core and this was the sort of thing that required hints provided. Or am I misunderstanding this entirely.

Numa noob over here.

# ? Sep 2, 2022 20:46

Adbot: ADBOT LOVES YOU

# ? Jun 5, 2024 03:52

Pyromancer: Apr 29, 2011; This man must look upon the fire, smell of it, warm his hands by it, stare into its heart

Have anyone tried Terraform's 1.2 new lifecycle instruction replace_triggered_by?
I'm generating a random username and password with aws_secretsmanager_random_password, use them to create a Document DB cluster and save them in Secret Manager. To prevent terraform overwriting them next time(or after pwd rotation) there is ignore_changes instruction for secret string and for DB user/pass on cluster.
So far everything worked, but I wanted to also update secret with new random if some change replaces the cluster. Tried adding this to secret version lifecycle
replace_triggered_by = [
aws_docdb_cluster.cluster.cluster_resource_id
]
Nothing, and neither does using aws_docdb_cluster.cluster.master_username or aws_docdb_cluster.cluster trigger replacement. I see on the state file the dependency between secret and cluster appears, but the secret replacement isn't triggered.

# ? Sep 5, 2022 18:42

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Pyromancer posted:

Have anyone tried Terraform's 1.2 new lifecycle instruction replace_triggered_by?
I'm generating a random username and password with aws_secretsmanager_random_password, use them to create a Document DB cluster and save them in Secret Manager. To prevent terraform overwriting them next time(or after pwd rotation) there is ignore_changes instruction for secret string and for DB user/pass on cluster.
So far everything worked, but I wanted to also update secret with new random if some change replaces the cluster. Tried adding this to secret version lifecycle
replace_triggered_by = [
aws_docdb_cluster.cluster.cluster_resource_id
]
Nothing, and neither does using aws_docdb_cluster.cluster.master_username or aws_docdb_cluster.cluster trigger replacement. I see on the state file the dependency between secret and cluster appears, but the secret replacement isn't triggered.

aws_secretsmanager_random_password is a data source, so it doesn't have a lifecycle in this way. It's odd that it permits this usage without an error. You could maybe use the random_password resource to get the behavior you're after.

# ? Sep 5, 2022 21:42

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Methanar posted:

There is still the potential that a given workload gets put on a cpu core that isn't particularly well-aligned with the physical ram associated with the process, right?
I was under the impression that the linux kernel wouldn't always be able to inherently know what memory is good for a given cpu core and this was the sort of thing that required hints provided. Or am I misunderstanding this entirely.

Numa noob over here.

It's maybe easier to reason through NUMA if you don't reason through it as memory affinity, and instead you reason through it as CPU affinity. You start off by packing your memory allocations into a certain region of memory, but the bulk of the work is in running your code on processors close to that memory. So the tradeoff is you potentially have higher throughput to that memory, but the scheduler achieves that by making decisions that would otherwise be sub-optimal, holding back processes from running on an available CPU until a closer one is available.

If you've read that and started to reason through this as a throughput/latency trade-off, you're mostly right. And this might be counterintuitive, since the purpose of NUMA affinity is ostensibly to reduce the latency of memory accesses. But NUMA affinity really shines when you have:

a) fairly homogeneous workloads on the system that cause few surprises for the scheduler;
b) high utilization but little contention for the CPUs themselves;
c) large-scale parallelism that can benefit from allocating large amounts of memory up-front;
d) a multi-process architecture that shards well along NUMA node boundaries.

So the main place you see this implemented is for high-throughput HPC batch jobs, but you might be able to squeeze better latency out of affinity in very particular single-purpose streaming/MQ systems. The JVM in particular does well here because of the consistent and predictable way (in this one particular case, relative to alternatives!) memory is allocated on the heap, so you'll see it come up a lot in affinity-oriented articles like this one from Alibaba on RocketMQ performance.

# ? Sep 5, 2022 22:01

Pyromancer: Apr 29, 2011; This man must look upon the fire, smell of it, warm his hands by it, stare into its heart

Vulture Culture posted:

aws_secretsmanager_random_password is a data source, so it doesn't have a lifecycle in this way. It's odd that it permits this usage without an error. You could maybe use the random_password resource to get the behavior you're after.

The lifecycle isn�t on aws_secretsmanager_random_password, it is on aws_secretsmanager_secret_version. Recreating that will update it with the random password from current run, but it doesn�t trigger

# ? Sep 5, 2022 22:35

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Pyromancer posted:

The lifecycle isn�t on aws_secretsmanager_random_password, it is on aws_secretsmanager_secret_version. Recreating that will update it with the random password from current run, but it doesn�t trigger

Can I poke more at the lifecycle you're reaching for, and what you're looking to get done with it? It seems like you're getting close to automated credential rotation, but not quite tying into the workflow I would expect with that. With the limited information available, I have a spidey sense that there's two workflows being conflated into one process here, and I'll recommend different approaches for both:

You want credentials rotated periodically. You can configure your Secrets Manager secret with a Rotation Lambda ARN to rotate this credential every N days.
Credentials actually belong to a specific instance instead of whatever symbolic name you've chosen to reference the instance. You can use a Secrets Manager secret name with an ID that includes the DocDB cluster ID instead of a well-known string like "prod", then add some type of indirection/lookup so your services can find it.

# ? Sep 6, 2022 02:09

Pyromancer: Apr 29, 2011; This man must look upon the fire, smell of it, warm his hands by it, stare into its heart

Vulture Culture posted:

Can I poke more at the lifecycle you're reaching for, and what you're looking to get done with it? It seems like you're getting close to automated credential rotation, but not quite tying into the workflow I would expect with that. With the limited information available, I have a spidey sense that there's two workflows being conflated into one process here, and I'll recommend different approaches for both:

You want credentials rotated periodically. You can configure your Secrets Manager secret with a Rotation Lambda ARN to rotate this credential every N days.

Credentials actually belong to a specific instance instead of whatever symbolic name you've chosen to reference the instance. You can use a Secrets Manager secret name with an ID that includes the DocDB cluster ID instead of a well-known string like "prod", then add some type of indirection/lookup so your services can find it.

It's not for rotation, more similar to the second one. I just want to create a cluster with random username and password on each environment instead of hardcoding a default. And as I mentioned that works ok - credentials are created and stored to secret manager, so anyone with access to that secret can access the DB using just the secret name to get the connection string.
What I'm trying to resolve with replace_triggered_by is what happens when someone makes a change forcing cluster replacement later. In that case the cluster gets remade with the new generated password. However the secret isn't updated, it still has the password of the previous cluster, because the change is unrelated to secret. So what I'm trying to do is tying aws_secretsmanager_secret_version and aws_docdb_cluster resources to always recreate together.
It's not a big deal and not something that's likely to happen, but by description replace_triggered_by is made just for such cases.

Found a way to make this work, by adding a null resource in-between, although it is still strange to me it doesn't otherwise:

code:

resource "aws_secretsmanager_secret" "secret_manager" {
  name = "${local.project_name}-sm"
}

resource "aws_secretsmanager_secret_version" "secret" {
  secret_id     = aws_secretsmanager_secret.secret_manager.id
  secret_string = jsonencode(tomap({
    username : "${data.aws_secretsmanager_random_password.random_username.random_password}",
    password : "${data.aws_secretsmanager_random_password.random_pass.random_password}",
...
  }))
  lifecycle {
    ignore_changes = [
      secret_string
    ]
    replace_triggered_by = [
      #docdb cluster was remade, so the random user/password is now different to the saved version
      null_resource.replace_secret
    ]      
  }  
}

resource "null_resource" "replace_secret" {
  triggers = {
    rotate = aws_docdb_cluster.cluster.cluster_resource_id
  }
}

resource "aws_docdb_cluster" "cluster" {
  cluster_identifier              = local.project_name
  engine                          = "docdb"
  master_username                 = data.aws_secretsmanager_random_password.random_username.random_password
  master_password                 = data.aws_secretsmanager_random_password.random_pass.random_password
...
}

data "aws_secretsmanager_random_password" "random_pass" {
  password_length = 20
  exclude_characters = "@/""
  include_space = false
  require_each_included_type = true
}

data "aws_secretsmanager_random_password" "random_username" {
  password_length = 8
  exclude_numbers = true
  exclude_punctuation = true
  exclude_uppercase = true
  include_space = false
}

Pyromancer fucked around with this message at 08:00 on Sep 6, 2022

# ? Sep 6, 2022 07:33

Junkiebev: Jan 18, 2002; Feel the progress.

it looks like docker's ONBUILD command is falling out of fashion (b/c it's not OCI-compatible?) - does anyone know what new thing is replacing it, functionally? It was handy to just 1-liner to reference a build image for various frameworks

# ? Sep 7, 2022 07:00

StumblyWumbly: Sep 12, 2007; Batmanticore!

Testing question: I have a public library that accesses our AWS API. Customers can generate their own API key and use our library to access stuff. I'd like to add automated testing for this. We store the code on Github so my thought was we'd store a test API key in GitHub secret, pass it to the function, and go.
Is there a risk someone could modify the code to print out an encoded version of the API key that is stored in the github secret? Is there a best practice for how to manage this situation?

# ? Sep 8, 2022 00:25

Hadlock: Nov 9, 2004

Do not ever put (unencrypted) secrets in GitHub, not even once

If you must put a secret in source control, encrypt it using sops or something that your CI/CD can decrypt

Edit: oh, you mean a GitHub actions GitHub secret? Yeah I'm theory they could branch, create a pr that base 64 encodes the secret and then print(str.$b64encodedsecret)

GitHub will do basic secret protection but nothing is perfect

Hadlock fucked around with this message at 00:54 on Sep 8, 2022

# ? Sep 8, 2022 00:50

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Anyone running Knative?

We have a bunch of roughly interchangeable services we want to be able to run on-demand in our k8s clusters. Right now they�re deployments and services and we maintain a minimum of 1-2 depending on how used they are but the unused or infrequently uses ones are placing a bit of a strain on the clusters� resources.

I looked briefly into Keda for event driven scaling but we�d have to change a bunch of our architecture to handle it properly.

Any other alternatives out there?

# ? Sep 8, 2022 00:54

StumblyWumbly: Sep 12, 2007; Batmanticore!

Hadlock posted:

Do not ever put (unencrypted) secrets in GitHub, not even once

If you must put a secret in source control, encrypt it using sops or something that your CI/CD can decrypt

Edit: oh, you mean a GitHub actions GitHub secret? Yeah I'm theory they could branch, create a pr that base 64 encodes the secret and then print(str.$b64encodedsecret)

GitHub will do basic secret protection but nothing is perfect

Yeah I meant Github Actions Secret, since we test our code through GHA. The 64 encode situation is exactly the idea that came up, which would mean passing a secret into code is just never possible. Which sucks when you want to write a test that uses a secret API key

# ? Sep 8, 2022 01:26

Methanar: Sep 26, 2013; by the sex ghost

Blinkz0rz posted:

Anyone running Knative?

We have a bunch of roughly interchangeable services we want to be able to run on-demand in our k8s clusters. Right now they�re deployments and services and we maintain a minimum of 1-2 depending on how used they are but the unused or infrequently uses ones are placing a bit of a strain on the clusters� resources.

I looked briefly into Keda for event driven scaling but we�d have to change a bunch of our architecture to handle it properly.

Any other alternatives out there?

https://keda.sh/docs/2.6/concepts/scaling-jobs/
Keda jobs scaling doesn't work either, based on messages in sqs or kafka or your other favorite message bus?

Knative is okay, but it's extremely heavy weight and non-trivial to work with and manage. You need to be familiar with service mesh. You need to be familiar internal PKI management.
You really really need to know what you're doing and have a lot of time to do it if you're going to provide it as a service for the rest of your org.

# ? Sep 8, 2022 01:41

Scikar: Nov 20, 2005; 5? Seriously?

StumblyWumbly posted:

Testing question: I have a public library that accesses our AWS API. Customers can generate their own API key and use our library to access stuff. I'd like to add automated testing for this. We store the code on Github so my thought was we'd store a test API key in GitHub secret, pass it to the function, and go.
Is there a risk someone could modify the code to print out an encoded version of the API key that is stored in the github secret? Is there a best practice for how to manage this situation?

Who is "someone" in this scenario? GH Actions does have a few things in place to protect against this, see this security blog post and this documentation page which go into the details. The tools are there, but the explanation and documentation are very GitHub specific in my opinion, it's not easy to get your head around the distinction between pull_request and pull_request_target to plan it all out correctly when you're migrating from other CI tools or starting from scratch. Once you've implemented and run some workflows then all the context and event and trigger stuff makes more sense.

The short of it is that PRs originating from forks don't have access to secrets for exactly this reason, you have to set up chained workflows just to make this possible, so that you have places to build the appropriate controls.

# ? Sep 8, 2022 13:35

StumblyWumbly: Sep 12, 2007; Batmanticore!

Scikar posted:

Who is "someone" in this scenario? GH Actions does have a few things in place to protect against this, see this security blog post and this documentation page which go into the details. The tools are there, but the explanation and documentation are very GitHub specific in my opinion, it's not easy to get your head around the distinction between pull_request and pull_request_target to plan it all out correctly when you're migrating from other CI tools or starting from scratch. Once you've implemented and run some workflows then all the context and event and trigger stuff makes more sense.

The short of it is that PRs originating from forks don't have access to secrets for exactly this reason, you have to set up chained workflows just to make this possible, so that you have places to build the appropriate controls.

The PR thing was a separate thing I was wondering about, glad to hear it works that way.

The short version of my question is: Is it safe to pass a secret into code that is part of a public repo?

I want to test our code through GH Actions using something like:

Python posted:

run: python run_test.py --api ${{ secrets.API_KEY }}

I know GitHub protects against a user adding in print(f"API Key = {args.api}"), but it sounds like they could do print(f"API Key = {reversable_encoding(args.api)}")

Seems like we need to generate a short term key using non-public code (doable but a pain for the backend), or somehow add the API key to the request outside the code (which seems harder). Or we could have the API tests only run on PRs from trusted folks, which is not the end of the world for this system but still not fun.

# ? Sep 8, 2022 14:32

crazypenguin: Mar 9, 2005; nothing witty here, move along

Long lived secrets are annoying.

There are alternatives in GitHub actions: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services

# ? Sep 8, 2022 15:08

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS