|
honestly I could solve this entire god drat problem w/ a PowerShell script, but then some jerk would need to own it and that jerk would be me (thread title)
|
# ? Sep 10, 2022 04:39 |
|
|
# ? May 18, 2024 09:12 |
Methanar posted:I haven't been able to do any kind of useful project work of my own in like, 2 months. It's just been non-stop emergency firefighting and dealing with interrupts and making sure nobody else is blocked. Ty for your service 😩
|
|
# ? Sep 10, 2022 14:58 |
|
I have this dream that I can take my k8s setup that I inherited and move everything in it to one of the PaaS providers. I am the sole developer in a company that has a monolith backend (plus a handful of lambdas) and also 4 major front end web apps and a native app. The bulk of our k8s setup is just the same Rails image but with different entry commands. A bunch of web servers, a huge fleet of resque (background job) workers, and then a couple of bespoke rake tasks. We do have a nextjs app running in there as well but I'd like to move that to Netlify or Vercel anyway. I simply have too many job duties and I really don't care for devops relative to all the others and I'm wondering if there's any viability in moving my poo poo into a managed system like Railway, Render, or Fly.io. The guy before me put a lot of work into setting this up but it feels like we need to hire someone just to manage it and I'm not sure if we're serving enough traffic to warrant a full time devops person or if this whole stack is just overengineered because the last guy had k8s hype or wanted to get it on his resume. We do process a ton of data because we're a content sharing solution so we are non stop pulling in media files from Dropbox or other similar services in those background queues, plus doing a lot of other 24/7 background processing for other aspects of the system. Should I just push to hire somebody who's horny for AWS the way I'm horny for front end and let them manage all this or can PaaS scale up to "fairly well established startup crunching a ton of data" level and maybe do it cheaper than a hire?
|
# ? Sep 11, 2022 05:54 |
Why is ansible being so annoying with my docker ECR login attempt? First I tried: code:
quote:TASK [Prep for docker pull from ECR] *************************************************************************************************************************************************************************************************** Ok so a couple things in there, "Cannot perform an interactive login from a non TTY device" and "You must specify a region..." Thinking the TTY thing was the issue, I then tried: code:
quote:tal: [my-server]: FAILED! => {"changed": true, "cmd": ["aws", "ecr", "get-login-password"], "delta": "0:00:04.404906", "end": "2022-09-11 21:56:37.173636", "msg": "non-zero return code", "rc": 253, "start": "2022-09-11 21:56:32.768730", "stderr": "\nYou must specify a region. You can also configure your region by running "aws configure".", "stderr_lines": ["", "You must specify a region. You can also configure your region by running "aws configure"."], "stdout": "", "stdout_lines": []} So this seems like the AWS credentials are not configured. They are though! Both ~/.aws/config[/fixed[ and [fixed]~/.aws/credentials exist on the remote machine for the ansible_user and the get-login-password command works fine if I ssh in and run it manually. What's the deal??
|
|
# ? Sep 11, 2022 22:58 |
Of course as soon as I post I came up with a workaround. This seems to work fine:code:
|
|
# ? Sep 11, 2022 23:07 |
|
IIRC ansible's shell defaults to /bin/sh and from the error messages I would suspect that whatever {{ ansible_user }} is on the remote node has all of the AWS stuff configured under bash. You can try switching the shell to /bin/bash, or try sourcing that user's bashrc before running your commands. FWIW if I just set connection: local and run a playbook against my desktop, that stuff all works out of the box with no extra config.
|
# ? Sep 12, 2022 03:28 |
12 rats tied together posted:IIRC ansible's shell defaults to /bin/sh and from the error messages I would suspect that whatever {{ ansible_user }} is on the remote node has all of the AWS stuff configured under bash. You can try switching the shell to /bin/bash, or try sourcing that user's bashrc before running your commands. Sounded promising! Unfortunately this didn't seem to work either: code:
code:
|
|
# ? Sep 12, 2022 08:17 |
|
Sadly this may be your best option:code:
|
# ? Sep 12, 2022 15:23 |
|
fletcher posted:Sounded promising! Unfortunately this didn't seem to work either: One thing maybe that can help you visualize the issue, in your shell command, instead of the actual command you want to run, do something like "env > some_file" or visualize the output of "env" so you can see what all variables you are working with. It's helped me a few times.
|
# ? Sep 12, 2022 16:36 |
tortilla_chip posted:Sadly this may be your best option: Super-NintendoUser posted:One thing maybe that can help you visualize the issue, in your shell command, instead of the actual command you want to run, do something like "env > some_file" or visualize the output of "env" so you can see what all variables you are working with. It's helped me a few times. Ahhh thank you both! I think I've figured it out now. Ansible is running as root and using sudo to run as ansible_user (non-root user) So the issue was that while ansible_user had ~/.aws/config & ~/.aws/credentials configured, the root user did not (/root/.aws/config & /root/.aws/credentials). Configuring the AWS creds for the root user fixed it, and my original attempt now works fine: code:
|
|
# ? Sep 12, 2022 21:44 |
|
fletcher posted:Ahhh thank you both! I think I've figured it out now. Glad to help! But .... fletcher posted:Ansible is running as root and using sudo to run as ansible_user (non-root user) This doesn't look right. It's been a while since I had to dig into this specifically, but ansible should be connecting via SSH as a non-privileged user and then elevating/become whatever you need. The way this looks means that you have SSH as root enabled on the destination host, which is a bad idea in general. I'm sure there's many other reasons you don't ssh in with ansible as root, at the moment I can't remember more, but maybe you need to check this.
|
# ? Sep 12, 2022 22:26 |
Super-NintendoUser posted:Glad to help! Ah, hmm. The env var dump in my playbook led me to believe this: code:
|
|
# ? Sep 12, 2022 22:40 |
|
So a few things here: You are using the `shell` plugin directly, which really isn't all that great for idempotency. Have you checked out the plugins that are available that have been built out for the functionality you're using? Also, avoid running tasks as root unless absolutely necessary. While at face value, using `become` doesn't seem all that bad at a high level of execution, but if it's needed for a task, add it for the task not the playbook and expect that the task is going to fail in the event that the resource does not provide it for you. This really isn't an issue when you have a small team of aware individuals writing the configs, it will become a absolute operational nightmare once you set a pattern like this and start handing off to development teams.
|
# ? Sep 13, 2022 01:01 |
drunk mutt posted:So a few things here: I'm not too worried about the idempotency for this one, since it's just generating a temporary token to download container images from ECR that's only valid for 12 hours anyways. The only examples I could find of people doing this in ansible all seemed to use the shell module directly. I did come across this ticket which talks about using amazno-ecr-credentials-helper. That seemed like even more complexity though for what should be just a simple shell one-liner, so I didn't end up pursuing that further. I didn't realize ansible was running as root though, I thought it was running as my lower privilege ansible_user. The thing I'm using this for is just for my own personal hobby project, so while I'd like to be closer to a solution that would be the "right way" if it was used by a larger team, it's not too big of a concern for this particular project.
|
|
# ? Sep 13, 2022 01:36 |
|
fletcher posted:I'm not too worried about the idempotency for this one, since it's just generating a temporary token to download container images from ECR that's only valid for 12 hours anyways. The only examples I could find of people doing this in ansible all seemed to use the shell module directly. I did come across this ticket which talks about using amazno-ecr-credentials-helper. That seemed like even more complexity though for what should be just a simple shell one-liner, so I didn't end up pursuing that further. Yeah, situations like this the pattern you put forth doesn't matter, but food for thought if you carry this pattern on into situations that actually would present the concern. I'm not suggesting you change things here, just have seen stuff like this turn into many emptied bottles of whiskey. When are you placing the become? I would assume this is something placed at the playbook level?
|
# ? Sep 13, 2022 02:17 |
drunk mutt posted:Yeah, situations like this the pattern you put forth doesn't matter, but food for thought if you carry this pattern on into situations that actually would present the concern. Ahhh yes, I had a become at the playbook level. I don't even remember adding that, it was in my initial commit from almost 2 years ago. Must have unknowingly copied it from some tutorial when I was starting out! I'll move that to only the modules that need it, since I definitely don't need to be using it everywhere. Thank you! I think that fully explains my issue I encountered now.
|
|
# ? Sep 13, 2022 07:01 |
|
prom candy posted:I have this dream that I can take my k8s setup that I inherited and move everything in it to one of the PaaS providers. I am the sole developer in a company that has a monolith backend (plus a handful of lambdas) and also 4 major front end web apps and a native app. The bulk of our k8s setup is just the same Rails image but with different entry commands. A bunch of web servers, a huge fleet of resque (background job) workers, and then a couple of bespoke rake tasks. We do have a nextjs app running in there as well but I'd like to move that to Netlify or Vercel anyway. Let me be about as clear as possible - over engineering is Bad Engineering when you're a small team because you're not delivering stuff almost certainly or you're probably overcomplicating maintenance for yourself down the road. But when you're greatly limited on time your best bet is to time box efforts to migrate stuff over to make all the infrastructure stuff Someone Else's Problem with the understanding that you will still need to do some maintenance no matter what. For example, when it comes to maintaining dependencies in your container images you'lll probably want to have something like Renovate bot or dependabot to handle more groundwork. On the other hand, I've seen entire jobs where some poor sap had to spend nearly a year migrating off of a PasS because their costs were so high (Heroku bill of $25k+ / month) they could justify hiring the engineer solely to migrate away, and that's what I think is best to lean toward for most SaaS companies - use boring AF technology until it hurts and you know exactly what you need to quickly rule out tech stacks. Worked out well for the company I'm at and although they made a lot of mistakes over time they did manage to pick a stack that is quite serviceable for years to come.
|
# ? Sep 13, 2022 08:30 |
|
What's cool about the advances made in container orchestration and serverless tech over the past few years is that they contribute absolutely nothing to being able to just package and run an average web app in an operationalized way. These platforms have been outlandishly expensive and universally sucked for non-toy use cases since their inception. They don't seem to be making too many strides in a better direction, and the industry seems to be moving in the direction of making this kind of PaaS model irrelevant. Static hosting (and maybe some edge workers) is great if you can externalize everything except your frontend tier. Generally, I think if an org already has the expertise to manage the infrastructure behind these kinds of apps, and they don't have trouble hiring that expertise when needed, these kinds of platforms are a waste of time. Updating the server image underneath your app once in awhile, in the year 2022, really isn't the huge operational expense people pretend it is. You still have to manage your app's internal dependencies. You still need to contend with it interacting with external dependencies. Vulture Culture fucked around with this message at 15:23 on Sep 17, 2022 |
# ? Sep 17, 2022 15:15 |
|
necrobobsledder posted:I've seen entire jobs where some poor sap had to spend nearly a year migrating off of a PasS because their costs were so high (Heroku bill of $25k+ / month) they could justify hiring the engineer solely to migrate away So at the end of that year, they've run up a $300k expense, and still haven't provided the company any cost savings, just potential future cost savings? They did that for a high-risk migration project that might fail entirely? And then at the end, the infrastructure isn't free, so even if you take a 50% cut on your hosting expenses, if those are fixed, you don't recoup that $300k investment for another two years, making their first year of salary a three-year sunk cost. This is stuff you can only rationalize if your infra spend is on some kind of exponential growth curve and you're looking at that $25k a month being $250k a month next year. I wonder what percentage Heroku even represented of the overall operational cost of that app. Layer in log aggregation, APM, telemetry, analytics, etc. and I can't see the company walking away with more than a 25% drop in long-term operating cost off of that whole project.
|
# ? Sep 17, 2022 15:37 |
|
Vulture Culture posted:Oh dear. This sounds like a very bad reason to hire an engineer. A senior engineer is what, $160k base? Then you have benefits making up another half to two-thirds of their salary, let's say $100k. Then they have a manager, and this engineer is taking up a fraction of their capacity, let's say 15% of the time and attention of someone who's also costing the company $260k a year in total compensation, so we're already at about $300k sunk in the first year. We'll ignore IT and technology spend (it's probably peanuts compared to compensation).
|
# ? Sep 17, 2022 16:05 |
|
Heroku in particular has prices that scale up horribly don't they? I never got to that level with them but I've heard the higher tiers are brutal.
|
# ? Sep 19, 2022 06:03 |
|
anyone here going to argocon?
|
# ? Sep 19, 2022 06:14 |
|
These are all rough numbers - I’m getting a bit far afield of my non-compsci background so I figured I’d ask for some advice. I need to distribute a ~55mb bloom filter encoded as a hex string to between several hundred and several thousand client cloud SIEM environments (think Azure Sentinel, Splunk Cloud, MDE, etc) over the public internet (unless azure lets me send data to a different tenant without going over the internet or peering VPCs, which I feel like should be a thing). Call it 100million database rows, using mariaDB’s rocksDB database engine (with which I’ve just learned exists today) to produce the filter. This updates whenever the table updates, but I don’t expect rows to change very often, so each iteration of the filter would I *think* be fairly similar to the previous. I need to distribute this to all the client environments at least once a minute. The client then runs the same hash function against an object in their environment to see if it’s (possibly) in our database or not. So now the question is how do I efficiently distribute a fairly large blob file that changes semi-frequently. I figure we need to host the blob at edge and put it behind a CDN, but I’m not sure how much caching gets me if the source changes regularly. I don’t want the client to have to download a 50mb file every time either, so we need a client side cache. Append only doesn’t work because sometimes rows will change, and so will the computed hash, and thus so will the bloom filter. My immediate thought was to divide my filter output into segments and check if each segment is identical in the client cache, which lets me balance compute vs network costs. And then I realized that at this point i lack the formal background to know if someone’s already solved this problem, and as it seems such a fundamental problem set, I’m sure someone already has. E: using a more efficient probabilistic filter like a cuckoo or XOR filter seems to be the way to go here. More space efficient and we can distribute buckets rather than the whole data object. Bigger downside is outside of redis, there’s not a lot of data stores that implement the more efficient algorithms for you. E2: I did some quick math on exactly how much volume the naive approach here would take and suddenly updating client environments once a day is okay lmao The Iron Rose fucked around with this message at 03:58 on Sep 20, 2022 |
# ? Sep 20, 2022 02:57 |
|
Comedy option: Route 53 DNS TXT entry with a TTL of 60 seconds. (I don't have anything helpful. My brain just conjured up that mess and I thought I shouldn't be the only one to suffer)
|
# ? Sep 20, 2022 03:13 |
|
The Iron Rose posted:These are all rough numbers - I’m getting a bit far afield of my non-compsci background so I figured I’d ask for some advice. Distribute by bittorrent. Make all of the clients send the blob to other clients. Call it originator-bandwidth-optimized distributed dynamic edge computing Methanar fucked around with this message at 04:09 on Sep 20, 2022 |
# ? Sep 20, 2022 04:05 |
|
Distributing a 50MB blob that might change anywhere inside it is tricky. Distributing 50MB of lots of little pieces of data that don't change very often is solved in certain situations. Like, if you could decompose the bloom filter blob into a data structure that fit into many reasonably-sized DB rows, then "all" you'd need to do is set up DB replicas. DBs are pretty good at pushing out changes to read-only replicas, and also recovering from the inevitable problems you'll get when the network falls over. Edit: Torrents would be very efficient but I think they assume the data is static; every time you modified the blob you'd need a new torrent. If you could break the bloom filter up into smaller pieces, it could work.
|
# ? Sep 20, 2022 05:22 |
|
Methanar posted:Distribute by bittorrent. Make all of the clients send the blob to other clients. minato posted:Distributing a 50MB blob that might change anywhere inside it is tricky. Distributing 50MB of lots of little pieces of data that don't change very often is solved in certain situations. Like, if you could decompose the bloom filter blob into a data structure that fit into many reasonably-sized DB rows, then "all" you'd need to do is set up DB replicas. DBs are pretty good at pushing out changes to read-only replicas, and also recovering from the inevitable problems you'll get when the network falls over. I think there’s really no way around splitting the bloom filter into smaller segments if you want to do this sanely. Much more efficient to decompose the filter into 50 different 1mb files and do comparisons on the segments. Would drastically reduce data volume especially if you don’t expect the majority of those 1mb substrings to change frequently. Then on the client side cache the segment, set a lengthy ttl, hash the substring and compare the local segment hash against the published hash of the remote segment to see if we need to invalidate. Do all of that async from queries that engage with the filter, and either concat on query time or pre-compute and have some other means of refreshing the data. Ultimately iI think this would significantly reduce the amount of data client environments need to download, allow us to update the bloom filter frequently, and also allows us to make much better use of CDN cacheing at edge to minimize our egress costs. The Iron Rose fucked around with this message at 05:47 on Sep 20, 2022 |
# ? Sep 20, 2022 05:44 |
|
How often do these clients need to check objects? Like is this a thousands of times a second thing or an every few minutes thing?
|
# ? Sep 20, 2022 06:13 |
|
just put it behind a cdn and and use the purge function when it changes it will cost you dollars in bandwidth. whole dollars!
|
# ? Sep 20, 2022 16:10 |
|
One other thing that's important is to understand how much control over the clients is happening. If you have a client that goes dark for, say, 1 year and comes back will that "torrent" still be available? What does phoning home mean? How is that secured in the future and how would you invalidate old key material and certs and so forth? Global CDNs are good for getting stuff out with the lowest latency and aggregate cost but for data that changes very often you're in the cache invalidation part of hard problems along with a variation of byzantine generals potentially if you ever have to rotate keys and URLs securely through distributed means. Note that with "SIEM" it's a security related issue that will require some auditing and non-repudiation sort of transactional logic. We distribute our security model artifacts that are all smaller than 50 MB via our own weirdo, whacko audited-to-death CDN-lite system and have our own custom delta update algorithm to perform updates quickly and with as small diffs as possible (the client requests a version range and we get it topped up to a major version then a delta after that pre-calculated and cached on our backends for the combinations of requests possible). Our encoding is nothing all that special though (we rely more upon strong certificate chains along with defense-in-depth approaches to distrust the host client system as an assumption) and is designed around taking the plaintext artifact and making it easy to decompress, update, and invalidate quickly. I highly suggest splitting up the full artifact somehow to avoid needing to re-download the whole artifact every other time someone closes their laptop. I'm thinking given it's essentially pulled from a big table you can use what amounts to the underlying DB's replication format (binary or logical isn't necessarily important). It's just copying and compressing the diffs over the wire, serializing to a file format, and completing the checkpoints as a transaction to be downloaded by clients. The Iron Rose posted:I think there’s really no way around splitting the bloom filter into smaller segments if you want to do this sanely. Much more efficient to decompose the filter into 50 different 1mb files and do comparisons on the segments. Would drastically reduce data volume especially if you don’t expect the majority of those 1mb substrings to change frequently.
|
# ? Sep 21, 2022 00:13 |
|
Wizard of the Deep posted:Comedy option: Route 53 DNS TXT entry with a TTL of 60 seconds. add it to a FROM scratch docker image Junkiebev fucked around with this message at 21:31 on Sep 21, 2022 |
# ? Sep 21, 2022 21:20 |
|
MightyBigMinus posted:just put it behind a cdn and and use the purge function when it changes this is the real answer fyi
|
# ? Sep 21, 2022 22:48 |
|
I've just gotten around to using cloud images for my proxmox homelab machine alongside Terraform to get stuff spun up and it's working quite well. The only thing that's bothering me at the moment is getting QEMU agents installed and working with the host. In theory this is pretty straightforward and the Pmox provider for Terraform even has a section in the code to enable support from the host's side of things. In practice things get a bit problematic. The provider presumes that the agent software is already present on the VM being provisioned and will loop and time out waiting for a ping response that isn't coming. I was installing the agent and other bits via an ansible playbook as part of the provisioning, but that never happens due to the loop. The answer appears to make the agent part of the base image and clone from that, but I'm not exactly sure how to skin that cat. Is there some way to add layers to a VM cloud image base without instantiating it and having it get assigned device ids and so on and so forth? Isn't that what Packer is all about? edit: After some more digging around I came across this blog post that mentions the usage of 'virt-customize' to layer packages in and I'm going to give that a shot. I still want/need to go mess with Packer at some point but one thing at a time. Warbird fucked around with this message at 19:02 on Sep 26, 2022 |
# ? Sep 26, 2022 18:22 |
|
Warbird posted:
packer spins up an ISO in an infrastructure provider, does stuff to it, including generally "generalizing" (sysprep, etc) it, and shits out a "templatized" image into the media library of the provider you chose. The thing you are trying to do is a perfect use-case assuming you are going to do it repeatedly.
|
# ? Sep 26, 2022 19:49 |
|
That's what I figured. I doubt I'll be doing in with an real frequency, but it'll be a decent skillset to have on hand. For the record virt-customize did exactly what I wanted with no fuss or muss so it seems like an ideal solution for small scale and one off applications.
|
# ? Sep 26, 2022 21:08 |
|
Does anyone have experience with using AMD based instances in aws? I'm looking over the new c6 stuff and I'm thinking about the c6a class. AWS is advertising a 15% uplift over c5 for price:performance for both intel and amd respectively, but the amd stuff is about 10% cheaper from even that. I'm looking over some public benchmarks comparing price:performance of intel vs amd now, but I'd be interested if anyone their own has anecdotes/success stories of looking at AMD's stuff.
|
# ? Sep 26, 2022 21:13 |
|
Is "Production Engineering" at Meta/Facebook basically devops/sre/platform engineer What does the rest of FAANG call this position Levels.fyi is showing abysmal pay for "production software engineer" like $160k TC for an SDE II production engineer at amazon fake edit: oh looks like meta has an L3 at $194k TC? But only $130 base
|
# ? Sep 26, 2022 22:22 |
|
Hadlock posted:Is "Production Engineering" at Meta/Facebook basically devops/sre/platform engineer PE @ Meta == SRE everywhere else. Facebook decided to call it something different because they refuse to follow Google's lead or even use any Google products internally, despite copy/pasting most of their culture from Google. I vaguely recall their rationale being along the lines of "Reliability implies that the engineer just cares about uptime, but really the scope is about engineering products to work at a production scale". Yeah whatever, it's just SRE.
|
# ? Sep 26, 2022 22:37 |
|
Methanar posted:Does anyone have experience with using AMD based instances in aws? We switched all of our CI agents to this (GitLab runner using the Kubernetes executor) and build times didn't change up or down but we didn't really expect them to because they aren't compute bound. The real benefit for us was vastly increased spot availability and lower spot prices compared to the equiv intel instance types.
|
# ? Sep 26, 2022 23:36 |
|
|
# ? May 18, 2024 09:12 |
|
whats for dinner posted:We switched all of our CI agents to this (GitLab runner using the Kubernetes executor) and build times didn't change up or down but we didn't really expect them to because they aren't compute bound. The real benefit for us was vastly increased spot availability and lower spot prices compared to the equiv intel instance types. Same here basically. Switched all of our EKS nodes from intel instances to AMD and saved a good chunk of change, basically the same performance (some workloads did a bit better on the AMD nodes).
|
# ? Sep 27, 2022 04:18 |