|
MightyBigMinus posted:if you have a diverse/mixed workload on the hardware (say dozens+ of containers) then no you'll probably never notice and the defaults will probably be fine. There is still the potential that a given workload gets put on a cpu core that isn't particularly well-aligned with the physical ram associated with the process, right? I was under the impression that the linux kernel wouldn't always be able to inherently know what memory is good for a given cpu core and this was the sort of thing that required hints provided. Or am I misunderstanding this entirely. Numa noob over here.
|
# ? Sep 2, 2022 20:46 |
|
|
# ? Jun 5, 2024 03:52 |
|
Have anyone tried Terraform's 1.2 new lifecycle instruction replace_triggered_by? I'm generating a random username and password with aws_secretsmanager_random_password, use them to create a Document DB cluster and save them in Secret Manager. To prevent terraform overwriting them next time(or after pwd rotation) there is ignore_changes instruction for secret string and for DB user/pass on cluster. So far everything worked, but I wanted to also update secret with new random if some change replaces the cluster. Tried adding this to secret version lifecycle replace_triggered_by = [ aws_docdb_cluster.cluster.cluster_resource_id ] Nothing, and neither does using aws_docdb_cluster.cluster.master_username or aws_docdb_cluster.cluster trigger replacement. I see on the state file the dependency between secret and cluster appears, but the secret replacement isn't triggered.
|
# ? Sep 5, 2022 18:42 |
|
Pyromancer posted:Have anyone tried Terraform's 1.2 new lifecycle instruction replace_triggered_by?
|
# ? Sep 5, 2022 21:42 |
|
Methanar posted:There is still the potential that a given workload gets put on a cpu core that isn't particularly well-aligned with the physical ram associated with the process, right? If you've read that and started to reason through this as a throughput/latency trade-off, you're mostly right. And this might be counterintuitive, since the purpose of NUMA affinity is ostensibly to reduce the latency of memory accesses. But NUMA affinity really shines when you have: a) fairly homogeneous workloads on the system that cause few surprises for the scheduler; b) high utilization but little contention for the CPUs themselves; c) large-scale parallelism that can benefit from allocating large amounts of memory up-front; d) a multi-process architecture that shards well along NUMA node boundaries. So the main place you see this implemented is for high-throughput HPC batch jobs, but you might be able to squeeze better latency out of affinity in very particular single-purpose streaming/MQ systems. The JVM in particular does well here because of the consistent and predictable way (in this one particular case, relative to alternatives!) memory is allocated on the heap, so you'll see it come up a lot in affinity-oriented articles like this one from Alibaba on RocketMQ performance.
|
# ? Sep 5, 2022 22:01 |
|
Vulture Culture posted:aws_secretsmanager_random_password is a data source, so it doesn't have a lifecycle in this way. It's odd that it permits this usage without an error. You could maybe use the random_password resource to get the behavior you're after. The lifecycle isn’t on aws_secretsmanager_random_password, it is on aws_secretsmanager_secret_version. Recreating that will update it with the random password from current run, but it doesn’t trigger
|
# ? Sep 5, 2022 22:35 |
|
Pyromancer posted:The lifecycle isn’t on aws_secretsmanager_random_password, it is on aws_secretsmanager_secret_version. Recreating that will update it with the random password from current run, but it doesn’t trigger
|
# ? Sep 6, 2022 02:09 |
|
Vulture Culture posted:Can I poke more at the lifecycle you're reaching for, and what you're looking to get done with it? It seems like you're getting close to automated credential rotation, but not quite tying into the workflow I would expect with that. With the limited information available, I have a spidey sense that there's two workflows being conflated into one process here, and I'll recommend different approaches for both: It's not for rotation, more similar to the second one. I just want to create a cluster with random username and password on each environment instead of hardcoding a default. And as I mentioned that works ok - credentials are created and stored to secret manager, so anyone with access to that secret can access the DB using just the secret name to get the connection string. What I'm trying to resolve with replace_triggered_by is what happens when someone makes a change forcing cluster replacement later. In that case the cluster gets remade with the new generated password. However the secret isn't updated, it still has the password of the previous cluster, because the change is unrelated to secret. So what I'm trying to do is tying aws_secretsmanager_secret_version and aws_docdb_cluster resources to always recreate together. It's not a big deal and not something that's likely to happen, but by description replace_triggered_by is made just for such cases. Found a way to make this work, by adding a null resource in-between, although it is still strange to me it doesn't otherwise: code:
Pyromancer fucked around with this message at 08:00 on Sep 6, 2022 |
# ? Sep 6, 2022 07:33 |
|
it looks like docker's ONBUILD command is falling out of fashion (b/c it's not OCI-compatible?) - does anyone know what new thing is replacing it, functionally? It was handy to just 1-liner to reference a build image for various frameworks
|
# ? Sep 7, 2022 07:00 |
|
Testing question: I have a public library that accesses our AWS API. Customers can generate their own API key and use our library to access stuff. I'd like to add automated testing for this. We store the code on Github so my thought was we'd store a test API key in GitHub secret, pass it to the function, and go. Is there a risk someone could modify the code to print out an encoded version of the API key that is stored in the github secret? Is there a best practice for how to manage this situation?
|
# ? Sep 8, 2022 00:25 |
|
Do not ever put (unencrypted) secrets in GitHub, not even once If you must put a secret in source control, encrypt it using sops or something that your CI/CD can decrypt Edit: oh, you mean a GitHub actions GitHub secret? Yeah I'm theory they could branch, create a pr that base 64 encodes the secret and then print(str.$b64encodedsecret) GitHub will do basic secret protection but nothing is perfect Hadlock fucked around with this message at 00:54 on Sep 8, 2022 |
# ? Sep 8, 2022 00:50 |
|
Anyone running Knative? We have a bunch of roughly interchangeable services we want to be able to run on-demand in our k8s clusters. Right now they’re deployments and services and we maintain a minimum of 1-2 depending on how used they are but the unused or infrequently uses ones are placing a bit of a strain on the clusters’ resources. I looked briefly into Keda for event driven scaling but we’d have to change a bunch of our architecture to handle it properly. Any other alternatives out there?
|
# ? Sep 8, 2022 00:54 |
|
Hadlock posted:Do not ever put (unencrypted) secrets in GitHub, not even once Yeah I meant Github Actions Secret, since we test our code through GHA. The 64 encode situation is exactly the idea that came up, which would mean passing a secret into code is just never possible. Which sucks when you want to write a test that uses a secret API key
|
# ? Sep 8, 2022 01:26 |
|
Blinkz0rz posted:Anyone running Knative? https://keda.sh/docs/2.6/concepts/scaling-jobs/ Keda jobs scaling doesn't work either, based on messages in sqs or kafka or your other favorite message bus? Knative is okay, but it's extremely heavy weight and non-trivial to work with and manage. You need to be familiar with service mesh. You need to be familiar internal PKI management. You really really need to know what you're doing and have a lot of time to do it if you're going to provide it as a service for the rest of your org.
|
# ? Sep 8, 2022 01:41 |
|
StumblyWumbly posted:Testing question: I have a public library that accesses our AWS API. Customers can generate their own API key and use our library to access stuff. I'd like to add automated testing for this. We store the code on Github so my thought was we'd store a test API key in GitHub secret, pass it to the function, and go. Who is "someone" in this scenario? GH Actions does have a few things in place to protect against this, see this security blog post and this documentation page which go into the details. The tools are there, but the explanation and documentation are very GitHub specific in my opinion, it's not easy to get your head around the distinction between pull_request and pull_request_target to plan it all out correctly when you're migrating from other CI tools or starting from scratch. Once you've implemented and run some workflows then all the context and event and trigger stuff makes more sense. The short of it is that PRs originating from forks don't have access to secrets for exactly this reason, you have to set up chained workflows just to make this possible, so that you have places to build the appropriate controls.
|
# ? Sep 8, 2022 13:35 |
|
Scikar posted:Who is "someone" in this scenario? GH Actions does have a few things in place to protect against this, see this security blog post and this documentation page which go into the details. The tools are there, but the explanation and documentation are very GitHub specific in my opinion, it's not easy to get your head around the distinction between pull_request and pull_request_target to plan it all out correctly when you're migrating from other CI tools or starting from scratch. Once you've implemented and run some workflows then all the context and event and trigger stuff makes more sense. The short version of my question is: Is it safe to pass a secret into code that is part of a public repo? I want to test our code through GH Actions using something like: Python posted:run: python run_test.py --api ${{ secrets.API_KEY }} I know GitHub protects against a user adding in print(f"API Key = {args.api}"), but it sounds like they could do print(f"API Key = {reversable_encoding(args.api)}") Seems like we need to generate a short term key using non-public code (doable but a pain for the backend), or somehow add the API key to the request outside the code (which seems harder). Or we could have the API tests only run on PRs from trusted folks, which is not the end of the world for this system but still not fun.
|
# ? Sep 8, 2022 14:32 |
|
Long lived secrets are annoying. There are alternatives in GitHub actions: https://docs.github.com/en/actions/deployment/security-hardening-your-deployments/configuring-openid-connect-in-amazon-web-services
|
# ? Sep 8, 2022 15:08 |
|
Methanar posted:https://keda.sh/docs/2.6/concepts/scaling-jobs/ These aren’t jobs, they’re basically lambdas that we want to spin up, field a set of http requests, and then spin down. The caveat is that we want them to be able to stay warm and scale out if we have more traffic but scale down to 0 if we don’t see traffic for a certain period of time. Basically we just don’t want to have to manually manage deployments, services, HPAs, etc for things that mostly would just sit around but need to be available if requests come in.
|
# ? Sep 8, 2022 15:10 |
Blinkz0rz posted:These aren’t jobs, they’re basically lambdas that we want to spin up, field a set of http requests, and then spin down. The caveat is that we want them to be able to stay warm and scale out if we have more traffic but scale down to 0 if we don’t see traffic for a certain period of time. What’s the difference between a job and running a lambda?? In my mind a lambda is just another word for a job, hmm. Pls 2 educate me if possible.
|
|
# ? Sep 8, 2022 16:20 |
|
Blinkz0rz posted:Anyone running Knative? If you are on AWS, you can attach, and getting pretty hand-wavey here, but you can attach fargate workers to your cluster, and I forget the exact spin-up time but it's measured in single digit seconds, I think. We have a really bursty traffic flow and really can handle most stuff with ~50 workers, but then need to scale to ~250-350 workers almost instantly, then scale down over 2-3 hours to ~100. The problem is that the first new node to come online takes ~3min and we're dropping a ton of traffic while the cluster wakes up, especially on our ios devices. This is one of the next steps we're looking at (lol @ getting paged for too many 500 errors from ios devices on a weekly cadence) Curious to see how auto-autoscaling rolls out over the next couple of years, I agree tuning HPAs is a right pain in the dick, it's way too labor intensive and right now the solution is "get it good enough, then throw an extra 20% resources at it" which seems wasteful
|
# ? Sep 8, 2022 18:02 |
|
lambdas are pretty lightweight. there are some proof of concepts running a full django stack on a lambda but doesn't scale beyond that. a lot of stacks (like ours) the worker nodes are just the same full-fat django stack, but run with WORKER_REPORTING=TRUE flag or ANALYTICS_SPECIAL_ONE_OFF=TRUE flag set, instead of the default WEB_WORKER=TRUE Could you write a separate lambda to do what django is doing? just reading the top 100 rows of table X and spitting out a PDF for accounting? Sure but now you need to update the lambda every time someone adds a new field to your customers_global table or orders_manifest table or whatever fargate workers will let you spin up any container for any amount of time with any amount of resources, in about the same time a lambda takes to spin up, it's like a more expensive, more heavy duty version of lambda
|
# ? Sep 8, 2022 18:06 |
|
Blinkz0rz posted:These aren’t jobs, they’re basically lambdas that we want to spin up, field a set of http requests, and then spin down. The caveat is that we want them to be able to stay warm and scale out if we have more traffic but scale down to 0 if we don’t see traffic for a certain period of time. job in the kubernetes sense just means a pod which isn't associated to a deployment/replicaSet controller. If you wanted your job to be an http processor behind a lb, you can do that. The lifecycle of these job pods can be governed by what Keda reports as being the messages in a queue and the work that the pods are doing. When the pods stop doing work, keda can begin to wind things back down again. Your lambda pods can be kept warm in the sense that docker images can be kept on the filesystem of existing nodes. knative or otherwise, you're going to still have the issue of needing to wait 3 minutes to get new hardware if you go to the point of autoscaling hardware, which is probably what you want to do because this is the part that actually costs money. Hadlock posted:If you are on AWS, you can attach, and getting pretty hand-wavey here, but you can attach fargate workers to your cluster, and I forget the exact spin-up time but it's measured in single digit seconds, I think. We have a really bursty traffic flow and really can handle most stuff with ~50 workers, but then need to scale to ~250-350 workers almost instantly, then scale down over 2-3 hours to ~100. The problem is that the first new node to come online takes ~3min and we're dropping a ton of traffic while the cluster wakes up, especially on our ios devices. This is one of the next steps we're looking at (lol @ getting paged for too many 500 errors from ios devices on a weekly cadence) Is this bursty traffic predictable? Surely there are ways to pre-warm the hardware ahead of time if know the trigger or time when that massive spike comes. Is it possible to queue these requests onto kafka or something instead of them being direct api calls? quote:Curious to see how auto-autoscaling rolls out over the next couple of years, I agree tuning HPAs is a right pain in the dick, it's way too labor intensive and right now the solution is "get it good enough, then throw an extra 20% resources at it" which seems wasteful Methanar fucked around with this message at 19:52 on Sep 8, 2022 |
# ? Sep 8, 2022 19:50 |
|
Methanar posted:Is this bursty traffic predictable? Surely there are ways to pre-warm the hardware ahead of time if know the trigger or time when that massive spike comes. Is it possible to queue these requests onto kafka or something instead of them being direct api calls? This is absolutely a 100% solvable problem, but that would involve marketing 1) be responsible enough to let someone/the computer know in advance of a marketing/email push, or create a schedule in which case 2) actually follow through and adhere to the schedule Marketing is mostly flaky instagram people dancing atop the graves of many data analytics people who have quit out of frustration, so reactive scaling has to be the solution, rather than proactive
|
# ? Sep 8, 2022 21:39 |
|
I have just been given an AWS sub account that I can abuse as I please. I am going to use it as a proof of concept for actually running things with some semblance of organization. What are some good DevOps/AWS management resources? I'm not necessarily looking for technical details right now. That will come after I have a good map in my head of the structure and general principles. So more 1000-foot view stuff.
|
# ? Sep 8, 2022 21:44 |
|
Infrastructure as code Some kind of orchestrator/scheduler Continuous deployment Monitoring and alerting O'Reilly prints an awesome SRE book that's also open source online
|
# ? Sep 8, 2022 21:49 |
|
Don't use Control Tower, don't fall for the trap, don't read anything by or associate with any AWS employee who recommends it to you.
|
# ? Sep 8, 2022 21:53 |
|
Hadlock posted:O'Reilly prints an awesome SRE book that's also open source online Yeah, I might want to read something like this. O'Reilly hasn't failed me yet.
|
# ? Sep 8, 2022 22:03 |
|
Zapf Dingbat posted:I have just been given an AWS sub account that I can abuse as I please. I am going to use it as a proof of concept for actually running things with some semblance of organization. My # 1 tip is that if you need to see an overview of what's happening in any region, go to the VPC console and you can get a birds-eye view of all EC2 resources in all regions. Unlike Azure/GCP, AWS does not let you easily browse all this stuff "globally", you have to zoom into each region to see what's going on.
|
# ? Sep 8, 2022 22:15 |
|
Also please get in the habit of looking at your daily spend in Cost Explorer regularly. So you know what normal looks like and can catch weird poo poo on your bill early. It’s very easy for your bill to run away from you, especially when you are just playing around. AWS Budgets and Cost Anomaly Detection are good (and free) features to yell at you when spending spikes. Also found under Cost Explorer.
|
# ? Sep 9, 2022 00:57 |
|
I haven't been able to do any kind of useful project work of my own in like, 2 months. It's just been non-stop emergency firefighting and dealing with interrupts and making sure nobody else is blocked. This poo poo is killing me.
|
# ? Sep 9, 2022 01:36 |
|
This for my self-hosting home server rather than for cloud work, but I think this is still the better thread in which to ask: Is there a "native" podman equivalent of docker-compose.yml? Meaning a declarative file format where I can spec out a set of pods and containers and deploy / update them all with a single command. So far I'm aware of three hacky options: - podman-compose: takes a docker-compose.yml file and translated it to `podman run` shell commands. It's what I'm trying right now and I'm not very happy, it's hacky as poo poo, craps out on things as simple as spaces inside envvars. More importantly, the intra-container networking is a mess. Some of it is because docker-compose doesn't have the concept of pods, with networks fulfilling some of their role, but podman-compose puts everything into a single pod and it gets messy when trying to emulate docker networks - docker-compose socket: run the regular docker-compose program but have the podman socket translate it to podman commands. So basically the same thing as podman-compose but downstream from the podman cli instead of upstream. It probably avoids some of the string escaping silliness but I don't see how it can solve the fundamental mismatches - create your pods by hand and then use `podman generate kube` to turns them into k8s YAML files, which you can edit and replay. I could deal with the verbosity of K8S in exchange for the flexibility, but the thing it is generates one file per service, whereas with Compose you can just have everything in a single file, much easier to maintain if you have a small setup (like 5-10 containers). And I'm not sure how it deals with networking at all - like, I want to keep Redis and Postgres "hidden" and only accessible from a select list of containers, in Compose I just needed a network with internal=true, how do I do that here? e: Ok, I can actually put everything into a single k8s YAML file by doing `podman ps -qa | xargs podman generate kube`. If I can figure out how to define "networks" in this file format I might go that route. e2: nevermind lol, first test I did - generate a YAML for nextcloud + postgres + redis and replay it - already failed, it hosed up the port configuration and exposed the same port on ALL containers instead of just Nextcloud. Gonna look for something else, maybe I'll handcraft the YAML files with Dhall... NihilCredo fucked around with this message at 16:49 on Sep 9, 2022 |
# ? Sep 9, 2022 15:08 |
|
Anyone used Porter or have any thoughts on it? Just came up today in passing and I'm ambivalent.
|
# ? Sep 9, 2022 21:02 |
|
NihilCredo posted:This for my self-hosting home server rather than for cloud work, but I think this is still the better thread in which to ask: I would suggest using k3s so that the kubelet deals with talking to the container runtime and you deal with the relatively standardized Kubernetes API. Dealing with podman/containerd directly is a pain in the rear end.
|
# ? Sep 9, 2022 22:35 |
|
chutwig posted:I would suggest using k3s so that the kubelet deals with talking to the container runtime and you deal with the relatively standardized Kubernetes API. Dealing with podman/containerd directly is a pain in the rear end. this + use kustomize so you don't have to write out entire-rear end manifests
|
# ? Sep 10, 2022 03:45 |
|
I can't decide if something is a crazy anti-pattern for terraform I have a bunch of vcenters (some linked, but links don't propagate tag categories or values) I have a tag category (department number - single cardinality) and tags (the actual department number values) I'd like to put on them in a uniform way so that they may be applied to VMs and such. What I'm thinking is JSON with the vCenter URI available via REST call hard-code tag categories in TF module JSON with the tag values available via REST call tagging done in a terraform module with a provider populated by provided variables in main.tf for-each the vCenters, run the module within the module, for-each the tags and create them is this madness because it's not super declarative, or shrewd? I'm sure I'd end up using dynamics, but you can't initialize or reference different providers within a dynamic afaik Junkiebev fucked around with this message at 04:09 on Sep 10, 2022 |
# ? Sep 10, 2022 03:56 |
|
Junkiebev posted:I can't decide if something is a crazy anti-pattern for terraform Honestly I don't get what problem you're trying to solve. What's the thing that's preventing you from just having a set of tags defined that are applied to all of the resources that need tags? Is this some AWS thing I'm missing because I don't use AWS?
|
# ? Sep 10, 2022 04:20 |
|
If I understand you correctly, I would caution against using a module as a hack for a nested scope, and see if you can't express the relationship only in the caller state. The VMware API is, as I recall, complete dogshit, but I would hope that your resources accept lists of tags, so that you can do something likecode:
|
# ? Sep 10, 2022 04:21 |
|
12 rats tied together posted:I would hope that your resources accept lists of tags, nope 12 rats tied together posted:The VMware API is, as I recall, complete dogshit yep
|
# ? Sep 10, 2022 04:25 |
|
New Yorp New Yorp posted:Honestly I don't get what problem you're trying to solve. What's the thing that's preventing you from just having a set of tags defined that are applied to all of the resources that need tags? Is this some AWS thing I'm missing because I don't use AWS? In order to assign a tag to a resource in vSphere, the tag category [key] and tag value [value] must pre-exist, and be eligible for assignment to that "type" of resource I would like to create a tag category called "Department", with a cardinality of 1 I would like to create possible values from a list (of 300 or so) so that values exist uniformly across several vCenters. I'm not trying to assign tags to anything - I'm trying to create them identically, so that they are able to be used, in several vCenters. Junkiebev fucked around with this message at 04:31 on Sep 10, 2022 |
# ? Sep 10, 2022 04:27 |
|
Ah, got it, and the way that you "assign" a tag to "a vCenter" is to create it under a particular provider, where the provider has your admin access to that vCenter baked in? I think in that case, a module is your best bet. The rest of it doesn't seem especially bad to me.
|
# ? Sep 10, 2022 04:30 |
|
|
# ? Jun 5, 2024 03:52 |
|
12 rats tied together posted:Ah, got it, and the way that you "assign" a tag to "a vCenter" is to create it under a particular provider, where the provider has your admin access to that vCenter baked in? Well that's the kicker - the provider has the vCenter address as a property, so I'd need to instantiate the provider within the module i'd be calling in either a dynamic or a foreach which makes it a bit dicey if a vCenter is removed at a later date (which doesn't happen often, but does happen)
|
# ? Sep 10, 2022 04:34 |