|
luminalflux posted:Another thing to consider is how you handle dropping tables. We had some severe issues with RDS, each time we'd reboot with failover - something that AWS claims should be "fairly quick", we'd end up having hours+ of downtime. Way back in the day someone dropped a big table and while it was dropping, it took down the site due to ... idk what was the actual issue. So they rebooted the database. Thanks, this is great stuff to think about. I have an inherited setup right now that imo is just stupidly overcomplicated for a company our size. I'm looking at moving everything to PaaS because I think it'll ultimately be cheaper than hiring the devops engineer that I'm going to need to hire otherwise but doing a cutover on infra at this point in our business seems crazy and terrifying. I'd love to just move the whole thing to PlanetScale and Render.com or something.
|
# ? Mar 5, 2023 06:03 |
|
|
# ? Jun 5, 2024 03:04 |
|
This is all validating my career choices to intentionally get away from managing large, important data stores. Cause that poo poo is hard. Super interesting stuff, luminalflux. I would come to your TED talk. luminalflux posted:AWS support isn't enough Also, my god, so much this. They're trying their best but unless your monthly spend is measured in multi-millions good luck getting the right people engaged before you just figure it out yourself. I find myself routinely raising issues with our TAM and then explaining the solution we came up with ourselves to him a couple weeks later. The fact that we pay well into 6 figures a year just in support costs for this is comical Docjowles fucked around with this message at 06:41 on Mar 5, 2023 |
# ? Mar 5, 2023 06:34 |
|
prom candy posted:Thanks, this is great stuff to think about. I have an inherited setup right now that imo is just stupidly overcomplicated for a company our size. I'm looking at moving everything to PaaS because I think it'll ultimately be cheaper than hiring the devops engineer that I'm going to need to hire otherwise but doing a cutover on infra at this point in our business seems crazy and terrifying. I'd love to just move the whole thing to PlanetScale and Render.com or something. Friend of mine just joined Render, curious what your experience is. PlanetScale is pretty cool too, we've been eyeing them for horizontal sharding of our mysql. Honestly a lot of places could just host on a PaaS and call it a day. Too bad Heroku isn't viable anymore. Docjowles posted:Super interesting stuff, luminalflux. I would come to your TED talk. Thanks! I'm kinda kicking myself for not giving this online. I've given a version of this internally, but i've kinda moved on from MySQL since we hired 2 SREs that just focus on MySQL management. quote:I find myself routinely raising issues with our TAM and then explaining the solution we came up with ourselves to him a couple weeks later. The fact that we pay well into 6 figures a year just in support costs for this is comical There's a couple levers to pull here, I think i've delved into this before. A big one is talking to your account manager's manager and explain that you aren't getting good enough support. Your TAM's skillset might not be aligned with what you're doing. They can rotate the team for you - it's a big grenade to throw in there but I've done it. TAMs aren't be-all end-all though. They're good at routing your question to the correct service team, getting in front of service teams PM so you can kvetch at them for 50 minutes about why IRSA sucks rear end, and asking around among other TAMs stuff like "hey has anyone done <x>?"
|
# ? Mar 6, 2023 01:24 |
|
luminalflux posted:Friend of mine just joined Render, curious what your experience is. PlanetScale is pretty cool too, we've been eyeing them for horizontal sharding of our mysql. Honestly a lot of places could just host on a PaaS and call it a day. Too bad Heroku isn't viable anymore. It actually looks like you can do a zero downtime migration from RDS to PlanetScale so I might look more seriously at this option. I need to do the math a bit more carefully but it might actually also be cheaper than RDS for our needs. Render I don't have any experience with but it does look like it makes a lot of the tricky parts much easier. Railway is another option in that space that I've looked at.
|
# ? Mar 6, 2023 03:13 |
|
As the SRE for a number of petabyte-scale-but-can't-afford-it DB scenarios I was at an org that was one of the early adopters of AWS Aurora MySQL and let me tell you how incredibly frustrating it was when AWS had bugs with it where there was a constant stream of silly set of bug fixes after one of our "massive" load spikes of maybe 100k write requests / minute buckled during the ~2 hours. One jaw-dropper was a note from the TAM that failover didn't work correctly because they were mistakenly caching DNS entries in their JVM, which was pretty darn well known by every senior Java dev I thought as how the JVM has worked for decades? I really hope it was something custom but the bugs really didn't give me much confidence that Aurora was crafted with the same kind of attention to detail that DynamoDB was.
|
# ? Mar 6, 2023 11:01 |
|
I think aws likes to hire well pedigreed but otherwise completely inexperienced folks and throw them at big problems. They could honestly use more ops folks all around and stop using baby programmers.
|
# ? Mar 7, 2023 17:01 |
|
Is there any tooling for end-to-end testing with cloud resources? Feels like Ansible would be a good fit, but feels like it's abusing the purpose of the tooling. The idea would be to not build a script as that's just horrible to maintain, but leverage tooling that would be easy to define resources to create, validate resources are created and can do what they're intended to do and tear down resources created for the tests. Really trying to avoid having a co-worker go off and just write a kludge of a script that will just add more burden to our small team.
|
# ? Mar 7, 2023 17:18 |
|
we use terratest
|
# ? Mar 7, 2023 17:20 |
|
The Fool posted:we use terratest We were looking at that, but the AWS provider doesn't have the API calls built into their HCL for the resources we need. So it gave certain co-workers the great idea that we could just use null resources again, and well, I just rather not have to have that conversation all over again.
|
# ? Mar 7, 2023 17:28 |
|
One of my teammates is extremely hype about terratest backed by localstack. So I tried it out and literally the first test I tried to write proved to be impossible due a bug around transit gateways in localstack which was of course the thing I needed to test. So that was a pretty major buzzkill.
|
# ? Mar 7, 2023 17:34 |
|
Is the job market messed up right now with the layoffs + copy cat layoffs? I've been looking since the start of the year, and barely get any follow ups. I feel like previously I'd at least get lots of assessments and interviews, at least. now its just a black hole I'm throwing applications into. I have a decade+ of practical experience at this point, but its not very traditional and leans towards sysadmin/operations, with heavy application support. So I apply to SRE/Devops roles aas well that sometimes fit this bill since people don't know how to describe SRE.
|
# ? Mar 7, 2023 17:35 |
|
drunk mutt posted:Feels like Ansible would be a good fit, but feels like it's abusing the purpose of the tooling. ansible is good at this and it isn't an abuse of the tool, there are several built in utilities specifically for writing playbooks that are test suites, even do not use molecule, however, it's bad.
|
# ? Mar 7, 2023 17:48 |
|
Sylink posted:Is the job market messed up right now with the layoffs + copy cat layoffs? Yes. My org has been in a hiring freeze for months and I doubt we're alone.
|
# ? Mar 7, 2023 19:02 |
|
Does your org do a freeze but still leave all the openings up ? Cause I see what must amount to millions of positions that never get filled, lmao. Sorry if this is a derail for the thread.
|
# ? Mar 7, 2023 20:07 |
|
12 rats tied together posted:do not use molecule, however, it's bad. e: though I will note that I literally don't use driver functionality aside from establishing connections, and that we wrote a collection of Molecule helpers that we call from cookiecutter'd molecule/*/{create,destroy} playbooks. I recommend this approach for internal roles. The Fool posted:we use terratest Vulture Culture fucked around with this message at 20:18 on Mar 7, 2023 |
# ? Mar 7, 2023 20:12 |
|
Sylink posted:Does your org do a freeze but still leave all the openings up ? yes lol
|
# ? Mar 7, 2023 20:13 |
|
welp then, cause I've done several interviews then seen the position still open months later I love that companies can throw like $20k in man hours on interviews then not fill a position.
|
# ? Mar 7, 2023 20:17 |
|
Sylink posted:welp then, cause I've done several interviews then seen the position still open months later Imagine each person you employ has a fixed cost and generates some amount of value (which varies from person to person). Now imagine the macroeconomic environment works as a multiplier on that value; sometimes it's 2x, or 10x, and sometimes it's 0.5x in a contraction. There's little point hiring someone who's going to bleed money for reasons that they and you literally cannot do anything about.
|
# ? Mar 7, 2023 20:23 |
|
Methanar posted:yes lol Recruiting sucks so bad lol. I could believe any reason for this, including wanting a pipeline full of apps ready to go when hiring opens up again, thinking having a bunch of bogus openings posted makes the company look good, or simple incompetence.
|
# ? Mar 7, 2023 20:40 |
|
The worst part is I get interviews when I've somehow reached the hiring manager/person I would be working for directly. Whenever I deal with an HR middle-person, they just check off some bullet points and don't have the knowledge to extrapolate skillsets or see how they would apply. Especially frustrating with computer touching where most of the skills is just being able to read documentation for new things and work with it, because there is just so much now to deal with, imo.
|
# ? Mar 7, 2023 21:30 |
|
There's also bureaucracy moving at the speed of bureaucracy, in that the hiring team may not get told they can't fill the role until they've finished interviewing and say they want to move forward with a particular person. Communication is both hard and important, yo.
|
# ? Mar 7, 2023 22:21 |
|
Docjowles posted:Recruiting sucks so bad lol. I could believe any reason for this, including wanting a pipeline full of apps ready to go when hiring opens up again, thinking having a bunch of bogus openings posted makes the company look good, or simple incompetence. In a lot of cases it’s probably just a result of people within the company attempting to navigate the bureaucracy. - Manager gets approval to open a req for some job. - Manager communicates to internal talent acquisition team the requirements for the job and they create the job posting. - Company institutes hiring freeze - Manager knows that if she closes the req she will have to go through the approvals all over again to get it re-opened, and it could be denied, so she decides to leave it open - the talent acquisition folks leave it open because the hiring manager hasn’t told them to close it, plus if the company isn’t even pretending to hire people then they don’t have a reason to remain employed - since the job is still posted people still apply and the talent acquisition people still do their job and screen applicants and forward some on to the hiring manager who may interview them with an eye towards hiring later or trying to get an exception from the freeze Plus hiring freezes are rarely universal. There’s generally *some* level of hiring going on just to maintain staffing levels as people leave.
|
# ? Mar 8, 2023 06:09 |
|
Sylink posted:Is the job market messed up right now with the layoffs + copy cat layoffs? The market just fell off a cliff after the first half of January I used to string along recruiters and they would email me back six weeks later. Now if I don't respond within 24 hours, recruiters are telling me they've already moved to final stage with an applicant if they even respond at all Another factor is that Q1 hiring is done, we're three weeks from the end of Q1. Q2 headcount and payroll dollars are about to be approved/released by the CFO. If you're looking for a job right now, you'll be spinning your wheels until about March 28 and then March 30-April 14 expect the floodgates to open. You should be spending the intermediate time grinding leetcode + cramming weird trivia like Twitter for zombies SQL (this came up in a recent tech screen for some reason?)
|
# ? Mar 8, 2023 09:09 |
|
Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume?
|
# ? Mar 8, 2023 17:34 |
|
Vulture Culture posted:Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume? We don't allow our pipelines to directly assume target IAM Roles, but use Vault as an IAM STS broker that gatekeeps the valid "target" roles for a given pipeline. So IAM Roles only need to trust Vault and we keep a map in Vault of pipeline -> role associations. Pipelines can then more easily use multiple IAM Roles and we don't have people loving around with trust policies. e: it also centralizes the "this repo fucks with this account" list by forcing repo maintainers to explicitly register the association. JehovahsWetness fucked around with this message at 18:04 on Mar 8, 2023 |
# ? Mar 8, 2023 18:01 |
|
how on earth are you hitting policy size limit with just a series of allow -> sts:AssumeRole?
|
# ? Mar 8, 2023 18:19 |
|
12 rats tied together posted:how on earth are you hitting policy size limit with just a series of allow -> sts:AssumeRole?
|
# ? Mar 8, 2023 18:42 |
|
~30 characters per account, (round up to 50) ~130 characters to set up the scaffolding for allowlisting "assume role with series of account targets" (round up to 150) * 11 = 1650 characters You get 1 inline policy at 10,240 characters and 10 managed policies at 6,144 characters for a total of 71,680 characters, after substracting for the scaffolding, that leaves you with (round down) 70,000 characters / 50 characters per account+role id = 1400 accounts? give or take? I don't think you should have more than like, 30 accounts, to be clear, but I think even if you're learning really hard into the micro account strategy, this isn't a limit that matters usually.
|
# ? Mar 8, 2023 18:50 |
|
Vulture Culture posted:Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume? We use OIDC role assumption with CircleCI
|
# ? Mar 8, 2023 19:08 |
|
I'm about to run into this problem, and assumed I'd have to split the policy up into multiple policies. Which is do-able, it's just a hassle. I'm intrigued about the Vault solution, but I'm bewildered about how that actually works. I grok'd all the nouns in that explanation, but I have no idea how they all connect together.
|
# ? Mar 8, 2023 20:27 |
|
if you're using ansible to handle your managed policies you can use jinja2's batch filter to express this pretty trivially YAML code:
You'll want to take care that you don't unnecessarily re-sort "accounts" but even if you do CloudFormation handles this update gracefully so there's no chance of outage.
|
# ? Mar 8, 2023 20:50 |
|
Vulture Culture posted:Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume? Do you need to enumerate the account IDs? This is not something I have set up, but thinking back, I'd assume we were using something that could assume a particular role name in any account. All the actual access control was within those accounts (e.g. trusting the CI/CD Service root account, and requiring, I think, an sts:ExternalId to help prevent any confused deputy type of attacks)
|
# ? Mar 8, 2023 22:21 |
|
Vulture Culture posted:Folks managing many AWS accounts at scale: how do you ensure your push-driven CI/CD systems (Jenkins, GitLab, etc.) don't exceed IAM policy size based on the number of roles they need to be permitted to assume? I guess i'm kinda struggling with the problem because we have a bunch of accounts and a lot of different CircleCI jobs in repos, but we use OIDC. We create an OIDC IdP in each account that trusts Circle's OIDC provider and use AssumeRoleWithWebIdentity (see Circle's docs). This also limits blast radius somewhat since you can lock down each role to a specific repo with conditions. If instead you insist on having One Big Fat Jenkins Role that needs to assume roles in other accounts and you need to grant it a bajillion sts:AssumeRole permissions, can you do something like permitting it to assume arn::iam::*:role/honkfarts, and in the trust policy for each role in the target accounts have a condition key for the organizational unit Jenkins lives in?
|
# ? Mar 9, 2023 01:01 |
|
Are there any better LSM controllers for Kubernetes than KubeArmor right now? I have a need to restrict some behaviors of a badly-behaved vendor application, and the deny/allow policy behaviors of KubeArmor are killing me. I'd like something where I can write policy into K8s somewhere, but I'll compile eBPF bytecode if I have to.
|
# ? Mar 9, 2023 02:37 |
Vulture Culture posted:… I'll compile eBPF bytecode if I have to. What a flex
|
|
# ? Mar 9, 2023 03:01 |
|
madmatt112 posted:What a flex
|
# ? Mar 9, 2023 03:41 |
Vulture Culture posted:Unfortunately, there's no way for me to defend this by saying "I was writing dtrace probes 15 years ago" without it also sounding like a flex, so I think all I can say is "this tech all sounds a lot harder than it is" Bro come on now I know you IRL whether you’re aware or not and that’s all I’ll say about that lmao. You are an outlier on the high end of the bell curve, but you still have a point that it all looks much more intimidating from the outside.
|
|
# ? Mar 9, 2023 05:35 |
|
It gets real fun when you get hired in as a "junior" with 20+ years with the exact technologies under your belt. It shouldn't happen, but hey sometimes someone just likes playing the miserable startup life for a majority of their life and it just doesn't look like they have experience on paper. But seriously, there are kids way smarter than me these days on aspects that I give absolutely no poo poo about...Like I do not give a gently caress about your preferred javascript library framework thingy...I will absolutely without doubt and stand my ground say that I will not touch JS. My hatred towards this has blinded me, but at the same time, I accept that blindness for the technology direct and delegate all of that poo poo to a person I trust can communicate that world to mine. I guess a lot of words to say, there are a lot of us around that were doing poo poo above and beyond expectations of the modern day, but the only thing that makes us different is this era gets by with doing WAY loving LESS and it's loving awesome.
|
# ? Mar 9, 2023 06:07 |
|
madmatt112 posted:Bro come on now I know you IRL whether you’re aware or not and that’s all I’ll say about that lmao. You are an outlier on the high end of the bell curve, but you still have a point that it all looks much more intimidating from the outside.
|
# ? Mar 9, 2023 07:37 |
|
|
# ? Jun 5, 2024 03:04 |
|
necrobobsledder posted:Not particularly accomplished or anything but as someone that works with folks writing eBPF working in the domain of security and o11y for K8S including one of the kernel contributors a lot of these supposedly fancy things sound intimidating mostly because people simply aren't that familiar with it. Like how iteration got supported is kinda bonkers but honestly it's not the worst thing I've seen anyone do either (it's not that different to me than doing a validator with ye olde funrollloops). There's a lot of hard, exasperatingly stupid work necessary and lots of sharp edges to deal with bleeding edge stuff that kind of takes away from the glamour of it all though IMO. You would think most of us experienced folks wouldn't romanticize all this newfangled stuff much but I have the suspicion it's a combination of the hype-makers and less experienced engineers that aren't jaded enough that they might fall for hype once in a while. Oh. I work with ebpf. Are you a bee?
|
# ? Mar 9, 2023 07:50 |