|
Hadlock posted:If you were tasked with overhauling monitoring and alerting + improve visibility into a highly real-time system, what kind of KPIs would you select for phase 1 of the project, and what would you push out to phase 2 or beyond the hard part to this is not knowing the answer, i could tell you right now its 'the golden signals' [1] and it wouldn't matter the hard part is building a management+team consensus over who gets the alerts and what the expectations of them responding are (ON CALL ROTATION). I would start with *just* tail-latency (95th or 99th%) and work out which endpoints page which people on which rotations and schedules first. if the answer is "all the alerts go to you, just to start before we roll it out to more people later" you're already hosed beyond recovery. some notes on how the process works in a giant supertechnocratic environment: https://sre.google/sre-book/service-level-objectives/ [1] https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals
|
# ? Jun 1, 2023 13:43 |
|
|
# ? Jun 8, 2024 20:17 |
|
The Iron Rose posted:Read Charity Majors’ blog. Her observability book + Alex Hildago's SLO book are both pro reads as well.
|
# ? Jun 1, 2023 14:35 |
|
I have a question about building containers. I have this Dockerfile:code:
code:
However, when I try to build it on a build VM, running debian 11 with buildah version 1.19.6 (image-spec 1.0.1, runtime-spec 1.0.2-dev) I get this error: pre:STEP 5: RUN dnf install -yq http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-38.noarch.rpm http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-38.noarch.rpm && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-rpmfusion-free-fedora-38 && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-rpmfusion-nonfree-fedora-38 && dnf upgrade -y && dnf clean all Error: Failed to download metadata for repo 'fedora': Cannot prepare internal mirrorlist: Curl error (6): Couldn't resolve host name for https://mirrors.fedoraproject.org/metalink?repo=fedora-38&arch=x86_64 [getaddrinfo() thread failed to start] error building at STEP "RUN dnf install -yq http://download1.rpmfusion.org/free/fedora/rpmfusion-free-release-38.noarch.rpm http://download1.rpmfusion.org/nonfree/fedora/rpmfusion-nonfree-release-38.noarch.rpm && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-rpmfusion-free-fedora-38 && rpm --import /etc/pki/rpm-gpg/RPM-GPG-KEY-rpmfusion-nonfree-fedora-38 && dnf upgrade -y && dnf clean all": error while running runtime: exit status 1 pre:search mydomain. nameserver 10.0.2.3 nameserver 192.168.1.253 nameserver 192.168.1.251 Any ideas what could possibly be wrong here?
|
# ? Jun 2, 2023 02:05 |
|
Volguus posted:I have a question about building containers. https://bugzilla.redhat.com/show_bug.cgi?id=1990469
|
# ? Jun 2, 2023 20:43 |
|
Vulture Culture posted:You may have an issue where your running kernel/glibc doesn't support the seccomp features required by the Fedora container. Is this Bugzilla thread helpful? Thank you. Maybe it would have. I saw this too late: quote:The work around was to use: --security-opt seccomp=unconfined I got pissed at debian with their ancient and buggy as hell packages and installed Fedora as the linux build VM and in there it works (since it does work on my machine). Plus I get newer versions of podman and buildah. There was a time (2 decades ago or so) when debian sid was a benchmark of stability for other distros. Nowadays the stable release is just a mishmash of buggy packages that don't work together if their life depends on it. I chose it initially since gitlab-runner is supported on Debian/Ubuntu but it isn't on Fedora latest. The industry is moving slowly, I suppose, it'll take probably 20 years of debian fuckups for people to just abandon them.
|
# ? Jun 3, 2023 01:33 |
|
Got a question for those of you who are working w/ Terraform: how much of writing your own modules from scratch is acceptable and when does it turn into just reinventing the wheel? I inherited a pretty extensive Terraform repo at my current position and literally everything is done by hand. I see a lot of instances where we could just, say, import the official AWS ECS module, make a couple of tweaks, and have something running fairly quickly. In this instance, my boss decided he wanted to "understand every component" before bringing up any infrastructure and as a result it's taken me way longer to feel like I have a handle on things. We have a directory of modules (that reference each other) that are then referenced in a directory of "environment templates" that are then referenced in an entirely separate git repo (one for each of our two products) where we actually implement the code. Also, there are no comments anywhere, and the README.md files are pretty sparse.
|
# ? Jun 5, 2023 15:47 |
|
Necronomicon posted:Got a question for those of you who are working w/ Terraform: how much of writing your own modules from scratch is acceptable and when does it turn into just reinventing the wheel? I inherited a pretty extensive Terraform repo at my current position and literally everything is done by hand. I see a lot of instances where we could just, say, import the official AWS ECS module, make a couple of tweaks, and have something running fairly quickly. In this instance, my boss decided he wanted to "understand every component" before bringing up any infrastructure and as a result it's taken me way longer to feel like I have a handle on things. We have a directory of modules (that reference each other) that are then referenced in a directory of "environment templates" that are then referenced in an entirely separate git repo (one for each of our two products) where we actually implement the code. Also, there are no comments anywhere, and the README.md files are pretty sparse. On the other hand, having coworkers who steadfastly refuse to read code other people have written seems like a big sparkly red flag.
|
# ? Jun 5, 2023 15:53 |
|
Necronomicon posted:Got a question for those of you who are working w/ Terraform: how much of writing your own modules from scratch is acceptable and when does it turn into just reinventing the wheel? I inherited a pretty extensive Terraform repo at my current position and literally everything is done by hand. I see a lot of instances where we could just, say, import the official AWS ECS module, make a couple of tweaks, and have something running fairly quickly. In this instance, my boss decided he wanted to "understand every component" before bringing up any infrastructure and as a result it's taken me way longer to feel like I have a handle on things. We have a directory of modules (that reference each other) that are then referenced in a directory of "environment templates" that are then referenced in an entirely separate git repo (one for each of our two products) where we actually implement the code. Also, there are no comments anywhere, and the README.md files are pretty sparse. Terraform-docs is your friend for generating documentation for custom modules. Custom modules are useful, sometimes, when you’re not just wrapping one or two resources and you’ve got something very organization specific and opinionated. But it’s so easy to shoot yourself in the foot with them I almost always default to using the raw public resource, and you have very few means of enforcing their use so they’re useless for setting standards. We use modules for creating VPCs (we have a moderately cursed setup to enable transitive peering in GCP), kubernetes clusters, new GCP projects, and DNS records. We also have one for generating a template of instances for splunk since that’s a stack that gets created and torn down semi-frequently. That’s about it and my life is a lot happier without a ton of single resource modules lying around, most of which I’ve had to purge over time. When in doubt, don’t create a module. Public modules are better than rolling your own if you can get away with it, but it’s still another layer of abstraction you probably don’t need.
|
# ? Jun 5, 2023 15:57 |
|
In general, I prefer to write my own modules. I don't like publicly available modules because they're excessively generic and over complicated. A module should be built for your environment to collect related resources into reusable blocks, and should require as little input as possible.Necronomicon posted:We have a directory of modules (that reference each other) that are then referenced in a directory of "environment templates" that are then referenced in an entirely separate git repo (one for each of our two products) where we actually implement the code. Also, there are no comments anywhere, and the README.md files are pretty sparse. You should try to only have one layer of abstraction, like (application repo)=>(module registry) And modules should not reference other modules
|
# ? Jun 5, 2023 16:07 |
|
The Fool posted:In general, I prefer to write my own modules. I don't like publicly available modules because they're excessively generic and over complicated. A module should be built for your environment to collect related resources into reusable blocks, and should require as little input as possible. I’m gonna module my module into a third module and put a provider in it to wrap a single resource and you can’t stop me!! what do you mean all my developers hate me and do shadow IT.
|
# ? Jun 5, 2023 16:08 |
|
The Iron Rose posted:When in doubt, don’t create a module. Agreed. quote:Public modules are better than rolling your own if you can get away with it But dont bother with public modules. quote:but it’s still another layer of abstraction you probably don’t need. Agreed. If you're using a module it should be adding value specific to your environment. If it is not doing that, just use the resource.
|
# ? Jun 5, 2023 16:12 |
|
The Fool posted:And modules should not reference other modules
|
# ? Jun 5, 2023 17:01 |
|
Vulture Culture posted:Most of the behaviors of modules for common services are just syntactic sugar, and they don't necessarily do anything useful, so I don't think I would sweat this. On the other hand, writing your own modules to wrap those services is also useless for the same reason. Terraform modules should have opinions that are substantially different from the underlying resources. They should either represent some new abstraction, or give you things "for free" alongside the resource that otherwise would be time-consuming or cumbersome to set up and integrate. We started out having a focused, opinionated set of modules for deploying a microservice to ECS. My boss REALLY wanted this solution to succeed and become widely adopted, so he told us to be extremely accommodating to any requests and feedback. Over time people kept asking for little features until it became simply a worse reimplementation of all the community modules for ECS, Aurora, Redis etc. Every request seemed reasonable in isolation but it was death by 1000 cuts. It's totally miserable to work on now. This is not really a Terraform-specific learning. But if we were starting over, I would take a harder line "this is what we provide and why, if you don't like it, use something else" stance. The solution no longer provides much value over things that already freely exist and have 20x the number of active maintainers.
|
# ? Jun 5, 2023 17:05 |
|
Docjowles posted:We started out having a focused, opinionated set of modules for deploying a microservice to ECS. My boss REALLY wanted this solution to succeed and become widely adopted, so he told us to be extremely accommodating to any requests and feedback. Over time people kept asking for little features until it became simply a worse reimplementation of all the community modules for ECS, Aurora, Redis etc. Every request seemed reasonable in isolation but it was death by 1000 cuts. It's totally miserable to work on now. This is not really a Terraform-specific learning. But if we were starting over, I would take a harder line "this is what we provide and why, if you don't like it, use something else" stance. The solution no longer provides much value over things that already freely exist and have 20x the number of active maintainers. ECS is a funny case, because it's got nearly as many tunables as an EC2 instance or launch template, but where in EC2 you might get some value out of adding other AWS service integrations like "set this value to true to automatically get a load balancer", this is something that ECS already gives you out of the box On the other hand, all the actual hard problems of ECS relate to AOP and separation of concerns, and Terraform is hilariously ill-equipped to deal with any of that Vulture Culture fucked around with this message at 17:14 on Jun 5, 2023 |
# ? Jun 5, 2023 17:11 |
|
you basically do not need modules at all now that for expressions exist and can evaluate to null, would be the main new thing i would communicate, in addition to the previous 7 years of me losing my mind in this thread about how bad they are and the other advice posted above to go a little further: even having an ECS module (community or self authored) is an anti pattern compared to "for each"ing a map variable once each for service and task definition 12 rats tied together fucked around with this message at 17:29 on Jun 5, 2023 |
# ? Jun 5, 2023 17:27 |
|
There are other UX reasons why you might want engineers spinning up infrastructures to think about something in a certain way, but it's often better to use a policy engine like OPA or Sentinel to accomplish that One of the best reasons to use a module is to abstract implementation details or environment differences away from engineers, which is a great reason to maintain data modules that don't manage anything
|
# ? Jun 5, 2023 18:30 |
|
Yeah if you're spinning up an environment for less than 1 week, terraform is not the correct way to do that. Especially if you're doing it more than 10 times a week. Maybe look at Pulumi which leverages Terraform technology
|
# ? Jun 5, 2023 18:37 |
|
Hadlock posted:Yeah if you're spinning up an environment for less than 1 week, terraform is not the correct way to do that. Especially if you're doing it more than 10 times a week. Maybe look at Pulumi which leverages Terraform technology this makes no sense whatsoever
|
# ? Jun 5, 2023 18:49 |
|
Hadlock posted:Yeah if you're spinning up an environment for less than 1 week, terraform is not the correct way to do that. Especially if you're doing it more than 10 times a week. Maybe look at Pulumi which leverages Terraform technology
|
# ? Jun 5, 2023 18:50 |
|
Vulture Culture posted:We have ephemeral Terraform-managed infra all over the place, including in the integration tests for our modules themselves. not only do we have ephemeral module tests and load testing environments we have a full on in house "application" for devs to check out ephemeral environments for poc's, training, and other dev work
|
# ? Jun 5, 2023 18:55 |
|
I don't entirely disagree. Terraform is heavy, to materialize a bunch of config into actual infrastructure you need to either duplicate it (cp -r), reference it (symlink), or you need a wrapper (module, global for_each, something that injects per-resource logical names). Terraform the binary is always hardcoded to look at CWD, so you end up massaging CWD in some way regardless of approach. Duplicating a big bundle of resources in Pulumi OTOH can be as simple as instantiating a new instance of MyApp and calling a method on it. Pulumi IIRC has an outer primitive that is somewhat analogous to a TFC/TFE "account" and can be used to orchestrate multiple workspaces from a single CWD. It's still probably not the pitch I would make. But it's not exactly wrong either, IMO.
|
# ? Jun 5, 2023 18:58 |
|
Vulture Culture posted:This is a new take to me. Would you mind going into some of the problems you've seen with Terraform for short-lived or ephemeral infrastructures? We have ephemeral Terraform-managed infra all over the place, including in the integration tests for our modules themselves. Where I was at our terraform was really brittle and a couple of the (aws) resources were flaky, leaving the state file locked and requiring someone to go in and unfuck it at least a couple times a week And, in a worst case scenario (not counting this towards the above argument), this happened before I started there, the ephemeral environments were tied to the production terraform so deployment problems with ephemeral environments, it would block production releases also we had a 1300+ line python script that called bash "libraries" for a home built templating system which had all sorts of weird edge case failures
|
# ? Jun 5, 2023 19:08 |
|
Docjowles posted:We started out having a focused, opinionated set of modules for deploying a microservice to ECS. My boss REALLY wanted this solution to succeed and become widely adopted, so he told us to be extremely accommodating to any requests and feedback. Over time people kept asking for little features until it became simply a worse reimplementation of all the community modules for ECS, Aurora, Redis etc. Every request seemed reasonable in isolation but it was death by 1000 cuts. It's totally miserable to work on now. This is not really a Terraform-specific learning. But if we were starting over, I would take a harder line "this is what we provide and why, if you don't like it, use something else" stance. The solution no longer provides much value over things that already freely exist and have 20x the number of active maintainers. There's a similar problem in CDK with "pattern libraries". They seem like a good idea but you either end up with: 1. A Do-Everything class that's a confusing mess of garbage options and spaghetti code, like you have. 2. A class that cannot be changed one bit ever, because everyone uses it but then "reaches behind the curtails" to mutate some detail, and so literally any change ever is a breaking change to SOMEONE. The approach I settled on was to write a set of re-usable and useful components, not as a library, but as the auto-generated starting template for our "new project" cli. It somewhat hides the fact that we're achieving re-use through copy-and-paste from the people who would knee-jerk against that without realizing the alternatives are Worse. e: (ok, that's a slight exaggeration, there is a library with a few things that we accept is type 2 above, and maintain it with major version bumps on almost any change, so we can leave people mostly alone happily using older major versions... but there's still a lot of code that's frequently customized that we just leave in the template.) crazypenguin fucked around with this message at 20:17 on Jun 5, 2023 |
# ? Jun 5, 2023 20:12 |
|
imo that is largely a software quality problem that occurs mostly when your factoring of problem to solution is done poorly. it's common across all types of code e.g. the "god object" design flaw, composition vs inheritance, cohesion vs coupling, etc IMO libraries in this domain should encapsulate types and behavior. using the standard vpc example: find my cidr allocation. give me an office peering config. join me to the private WAN. Pulumi handles this relatively gracefully because the objects are immutable but every object has a ThingParams type that is mutable, so you can write shared code that returns "here's what you should plug into your VPC", and then the callers can layer their exceptions on top of it. the calller owns all of their own exceptions, which mean they're free to resolve them locally and own the effects of that resolution, and it keeps your shared code focused on bridging the gap between where the knowledge is and the infracode, and gracefully handling any translation that needs to happen this does a pretty good job of preventing scope bleed between the "figure out what to do" stage and the "actually do it" stage. it's also way easier to write tests and assertions for a library module that emits a correct VPC config than it is for one that "creates the correct VPC". i don't remember how the cdk does this exactly but i remember running into it with the CDKTF and not being impressed
|
# ? Jun 5, 2023 21:25 |
|
12 rats tied together posted:imo that is largely a software quality problem that occurs mostly when your factoring of problem to solution is done poorly. it's common across all types of code e.g. the "god object" design flaw, composition vs inheritance, cohesion vs coupling, etc Sure, but I've definitely noticed this as a problem much more with IaC designs than other software. I vaguely suspect it might have something to do with the underlying designs for how the cloud resources are themselves designed: things just aren't quite composable enough somehow. ("this thing is a resource you create, but this other thing requires modifying your instance of this other resource you create, and which is which is partly an accident of history.")
|
# ? Jun 7, 2023 00:02 |
|
Azure Portal is straight-up down r/n lol
|
# ? Jun 9, 2023 16:45 |
|
Junkiebev posted:Azure Portal is straight-up down r/n lol I'm just going to take the rest of the day off
|
# ? Jun 9, 2023 16:49 |
Only scrubs use the GUI I too am pausing all work
|
|
# ? Jun 9, 2023 16:52 |
|
i am a moron posted:Only scrubs use the GUI I'm working on a module and I "need" the portal to "validate deployed configurations"
|
# ? Jun 9, 2023 16:54 |
Just grab the JSON with CLI and make those eyes bleed
|
|
# ? Jun 9, 2023 16:55 |
|
i am a moron posted:Only scrubs use the GUI API calls are failing as well.
|
# ? Jun 9, 2023 16:58 |
Well that is a way bigger problem than what they’re making it out to be so far. Portal is working for me again though
|
|
# ? Jun 9, 2023 17:00 |
|
Bleeping is saying DDOS: https://www.bleepingcomputer.com/news/microsoft/microsofts-azure-portal-down-following-new-claims-of-ddos-attacks/
|
# ? Jun 9, 2023 17:07 |
|
A bit ago we were talking ansible front ends and AWX came up. Since then Ansible Semaphore was brought to my attention and seems pretty slick. https://ansible-semaphore.com/ I don’t know if I’d use it as a production tool, but it seems to fit the bill of “I just want to see if the drat job ran and did the thing” I want for my homelab.
|
# ? Jun 10, 2023 17:57 |
|
semaphore is pretty good. in the same vein of ansible tooling, there is also ara, and something i'm particularly excited about is ansible-rulebook which is basically salt's salt reactor but for ansible
|
# ? Jun 10, 2023 18:15 |
|
Apparently Google really did sell Google domains to square space Does anyone have opinions on this, and suggestions on where I ought to move them to? I have about a 90/10 split of domains on Google/route 53 currently
|
# ? Jun 18, 2023 07:31 |
|
Namecheap? Been using them for years and have no issues.
|
# ? Jun 18, 2023 17:47 |
|
Dunno how they are for companies, but I've been using Porkbun and reasonably happy with them. They have an API (I haven't used it) so it seems like there's at least the possibility of running it at some level of scale.
|
# ? Jun 18, 2023 20:16 |
|
Looking into why it takes ages to run automation against one of our infosec team's AWS accounts and they have a bunch of 500MB binaries checked into the git repo. Why is it always infosec
|
# ? Jun 21, 2023 15:21 |
|
|
# ? Jun 8, 2024 20:17 |
|
Docjowles posted:Looking into why it takes ages to run automation against one of our infosec team's AWS accounts and they have a bunch of 500MB binaries checked into the git repo. Why is it always infosec Slows down attackers to give you time to prepare a defense, much like irregular stairs in a medieval castle
|
# ? Jun 21, 2023 16:46 |