Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »

Junkiebev: Jan 18, 2002; Feel the progress.

honestly I could solve this entire god drat problem w/ a PowerShell script, but then some jerk would need to own it and that jerk would be me (thread title)

# ? Sep 10, 2022 04:39

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 09:12

madmatt112: Jul 11, 2016; Is that a cat in your pants, or are you just a lonely excuse for an adult?

Methanar posted:

I haven't been able to do any kind of useful project work of my own in like, 2 months. It's just been non-stop emergency firefighting and dealing with interrupts and making sure nobody else is blocked.

This poo poo is killing me.

Ty for your service 😩

# ? Sep 10, 2022 14:58

prom candy: Dec 16, 2005; Only I may dance

I have this dream that I can take my k8s setup that I inherited and move everything in it to one of the PaaS providers. I am the sole developer in a company that has a monolith backend (plus a handful of lambdas) and also 4 major front end web apps and a native app. The bulk of our k8s setup is just the same Rails image but with different entry commands. A bunch of web servers, a huge fleet of resque (background job) workers, and then a couple of bespoke rake tasks. We do have a nextjs app running in there as well but I'd like to move that to Netlify or Vercel anyway.

I simply have too many job duties and I really don't care for devops relative to all the others and I'm wondering if there's any viability in moving my poo poo into a managed system like Railway, Render, or Fly.io. The guy before me put a lot of work into setting this up but it feels like we need to hire someone just to manage it and I'm not sure if we're serving enough traffic to warrant a full time devops person or if this whole stack is just overengineered because the last guy had k8s hype or wanted to get it on his resume. We do process a ton of data because we're a content sharing solution so we are non stop pulling in media files from Dropbox or other similar services in those background queues, plus doing a lot of other 24/7 background processing for other aspects of the system. Should I just push to hire somebody who's horny for AWS the way I'm horny for front end and let them manage all this or can PaaS scale up to "fairly well established startup crunching a ton of data" level and maybe do it cheaper than a hire?

# ? Sep 11, 2022 05:54

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Why is ansible being so annoying with my docker ECR login attempt?

First I tried:

code:

- name: Prep for docker pull from ECR
  shell: aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com

And got:

quote:

TASK [Prep for docker pull from ECR] ***************************************************************************************************************************************************************************************************
fatal: [my-server]: FAILED! => {"changed": true, "cmd": "aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com", "delta": "0:00:04.418440", "end": "2022-09-11 21:49:23.698837", "msg": "non-zero return code", "rc": 1, "start": "2022-09-11 21:49:19.280397", "stderr": "\nYou must specify a region. You can also configure your region by running "aws configure".\nError: Cannot perform an interactive login from a non TTY device", "stderr_lines": ["", "You must specify a region. You can also configure your region by running "aws configure".", "Error: Cannot perform an interactive login from a non TTY device"], "stdout": "", "stdout_lines": []}

Ok so a couple things in there, "Cannot perform an interactive login from a non TTY device" and "You must specify a region..."

Thinking the TTY thing was the issue, I then tried:

code:

- name: Get ECR password
  command: aws ecr get-login-password
  args:
    chdir: '/home/{{ ansible_user }}'
  register: aws_ecr_login_password

- name: Prep for docker pull from ECR
  shell: docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  args:
    stdin: '{{ aws_ecr_login_password.stdout }}'

However this still fails with:

quote:

tal: [my-server]: FAILED! => {"changed": true, "cmd": ["aws", "ecr", "get-login-password"], "delta": "0:00:04.404906", "end": "2022-09-11 21:56:37.173636", "msg": "non-zero return code", "rc": 253, "start": "2022-09-11 21:56:32.768730", "stderr": "\nYou must specify a region. You can also configure your region by running "aws configure".", "stderr_lines": ["", "You must specify a region. You can also configure your region by running "aws configure"."], "stdout": "", "stdout_lines": []}

So this seems like the AWS credentials are not configured. They are though! Both ~/.aws/config[/fixed[ and [fixed]~/.aws/credentials exist on the remote machine for the ansible_user and the get-login-password command works fine if I ssh in and run it manually. What's the deal??

# ? Sep 11, 2022 22:58

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Of course as soon as I post I came up with a workaround. This seems to work fine:

code:

- name: Prep for docker pull from ECR
  shell: aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  environment:
    AWS_CONFIG_FILE: '/home/{{ ansible_user }}/.aws/config'
    AWS_SHARED_CREDENTIALS_FILE: '/home/{{ ansible_user }}/.aws/credentials'

Why do I have to specify AWS_CONFIG_FILE & AWS_SHARED_CREDENTIALS_FILE with what should be their default values?

# ? Sep 11, 2022 23:07

12 rats tied together: Sep 7, 2006

IIRC ansible's shell defaults to /bin/sh and from the error messages I would suspect that whatever {{ ansible_user }} is on the remote node has all of the AWS stuff configured under bash. You can try switching the shell to /bin/bash, or try sourcing that user's bashrc before running your commands.

FWIW if I just set connection: local and run a playbook against my desktop, that stuff all works out of the box with no extra config.

# ? Sep 12, 2022 03:28

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

12 rats tied together posted:

IIRC ansible's shell defaults to /bin/sh and from the error messages I would suspect that whatever {{ ansible_user }} is on the remote node has all of the AWS stuff configured under bash. You can try switching the shell to /bin/bash, or try sourcing that user's bashrc before running your commands.

FWIW if I just set connection: local and run a playbook against my desktop, that stuff all works out of the box with no extra config.

Sounded promising! Unfortunately this didn't seem to work either:

code:

- name: Prep for docker pull from ECR
  shell: aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  args:
    executable: /bin/bash

Also tried sourcing bashrc, that didn't work either though:

code:

- name: Prep for docker pull from ECR
  shell: source /home/{{ ansible_user }}/.bashrc && aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  args:
    executable: /bin/bash

I don't get it!

# ? Sep 12, 2022 08:17

tortilla_chip: Jun 13, 2007; k-partite

Sadly this may be your best option:

code:

# Specify the location for the log file
export ANSIBLE_LOG_PATH=~/ansible.log
# Enable Debug
export ANSIBLE_DEBUG=True

# Run with 4*v for connection level verbosity
ansible-playbook -vvvv ...

# ? Sep 12, 2022 15:23

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

fletcher posted:

Sounded promising! Unfortunately this didn't seem to work either:

code:

- name: Prep for docker pull from ECR
  shell: aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  args:
    executable: /bin/bash

Also tried sourcing bashrc, that didn't work either though:

code:

- name: Prep for docker pull from ECR
  shell: source /home/{{ ansible_user }}/.bashrc && aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com
  args:
    executable: /bin/bash

I don't get it!

One thing maybe that can help you visualize the issue, in your shell command, instead of the actual command you want to run, do something like "env > some_file" or visualize the output of "env" so you can see what all variables you are working with. It's helped me a few times.

# ? Sep 12, 2022 16:36

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

tortilla_chip posted:

Sadly this may be your best option:

code:

# Specify the location for the log file
export ANSIBLE_LOG_PATH=~/ansible.log
# Enable Debug
export ANSIBLE_DEBUG=True

# Run with 4*v for connection level verbosity
ansible-playbook -vvvv ...

Super-NintendoUser posted:

One thing maybe that can help you visualize the issue, in your shell command, instead of the actual command you want to run, do something like "env > some_file" or visualize the output of "env" so you can see what all variables you are working with. It's helped me a few times.

Ahhh thank you both! I think I've figured it out now.

Ansible is running as root and using sudo to run as ansible_user (non-root user)

So the issue was that while ansible_user had ~/.aws/config & ~/.aws/credentials configured, the root user did not (/root/.aws/config & /root/.aws/credentials). Configuring the AWS creds for the root user fixed it, and my original attempt now works fine:

code:

- name: Prep for docker pull from ECR
  shell: aws ecr get-login-password | docker login --username AWS --password-stdin 38asdf.dkr.ecr.us-west-1.amazonaws.com

So I've learned an important thing about how ansible runs, and some very useful tips for additional debugging. Thanks again!

# ? Sep 12, 2022 21:44

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

fletcher posted:

Ahhh thank you both! I think I've figured it out now.

Glad to help!

But ....

fletcher posted:

Ansible is running as root and using sudo to run as ansible_user (non-root user)

This doesn't look right. It's been a while since I had to dig into this specifically, but ansible should be connecting via SSH as a non-privileged user and then elevating/become whatever you need. The way this looks means that you have SSH as root enabled on the destination host, which is a bad idea in general. I'm sure there's many other reasons you don't ssh in with ansible as root, at the moment I can't remember more, but maybe you need to check this.

# ? Sep 12, 2022 22:26

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

Super-NintendoUser posted:

Glad to help!

But ....

This doesn't look right. It's been a while since I had to dig into this specifically, but ansible should be connecting via SSH as a non-privileged user and then elevating/become whatever you need. The way this looks means that you have SSH as root enabled on the destination host, which is a bad idea in general. I'm sure there's many other reasons you don't ssh in with ansible as root, at the moment I can't remember more, but maybe you need to check this.

Ah, hmm. The env var dump in my playbook led me to believe this:

code:

SUDO_GID=1000
MAIL=/var/mail/root
USER=root
HOME=/root
SUDO_UID=1000
LOGNAME=root
TERM=xterm-256color
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
LANG=en_US.UTF-8
SUDO_COMMAND=/bin/sh -c echo BECOME-SUCCESS-oxzjzlnoscyqtutslsnnqjxelgdvhtbw ; /usr/bin/python3 /home/fletch/.ansible/tmp/ansible-tmp-1663015293.4956722-48047-232657918433070/AnsiballZ_command.py
SHELL=/bin/bash
SUDO_USER=fletch
PWD=/home/fletch

My sshd_config has DenyGroups root & DenyUsers root and nothing in /root/.ssh/authorized_keys, so it shouldn't be possible to ssh as root. Now I'm back to being confused

# ? Sep 12, 2022 22:40

drunk mutt: Jul 5, 2011; I just think they're neat

So a few things here:

You are using the `shell` plugin directly, which really isn't all that great for idempotency. Have you checked out the plugins that are available that have been built out for the functionality you're using?

Also, avoid running tasks as root unless absolutely necessary. While at face value, using `become` doesn't seem all that bad at a high level of execution, but if it's needed for a task, add it for the task not the playbook and expect that the task is going to fail in the event that the resource does not provide it for you.

This really isn't an issue when you have a small team of aware individuals writing the configs, it will become a absolute operational nightmare once you set a pattern like this and start handing off to development teams.

# ? Sep 13, 2022 01:01

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

drunk mutt posted:

So a few things here:

You are using the `shell` plugin directly, which really isn't all that great for idempotency. Have you checked out the plugins that are available that have been built out for the functionality you're using?

Also, avoid running tasks as root unless absolutely necessary. While at face value, using `become` doesn't seem all that bad at a high level of execution, but if it's needed for a task, add it for the task not the playbook and expect that the task is going to fail in the event that the resource does not provide it for you.

This really isn't an issue when you have a small team of aware individuals writing the configs, it will become a absolute operational nightmare once you set a pattern like this and start handing off to development teams.

I'm not too worried about the idempotency for this one, since it's just generating a temporary token to download container images from ECR that's only valid for 12 hours anyways. The only examples I could find of people doing this in ansible all seemed to use the shell module directly. I did come across this ticket which talks about using amazno-ecr-credentials-helper. That seemed like even more complexity though for what should be just a simple shell one-liner, so I didn't end up pursuing that further.

I didn't realize ansible was running as root though, I thought it was running as my lower privilege ansible_user. The thing I'm using this for is just for my own personal hobby project, so while I'd like to be closer to a solution that would be the "right way" if it was used by a larger team, it's not too big of a concern for this particular project.

# ? Sep 13, 2022 01:36

drunk mutt: Jul 5, 2011; I just think they're neat

fletcher posted:

I'm not too worried about the idempotency for this one, since it's just generating a temporary token to download container images from ECR that's only valid for 12 hours anyways. The only examples I could find of people doing this in ansible all seemed to use the shell module directly. I did come across this ticket which talks about using amazno-ecr-credentials-helper. That seemed like even more complexity though for what should be just a simple shell one-liner, so I didn't end up pursuing that further.

I didn't realize ansible was running as root though, I thought it was running as my lower privilege ansible_user. The thing I'm using this for is just for my own personal hobby project, so while I'd like to be closer to a solution that would be the "right way" if it was used by a larger team, it's not too big of a concern for this particular project.

Yeah, situations like this the pattern you put forth doesn't matter, but food for thought if you carry this pattern on into situations that actually would present the concern.

I'm not suggesting you change things here, just have seen stuff like this turn into many emptied bottles of whiskey.

When are you placing the become? I would assume this is something placed at the playbook level?

# ? Sep 13, 2022 02:17

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

drunk mutt posted:

Yeah, situations like this the pattern you put forth doesn't matter, but food for thought if you carry this pattern on into situations that actually would present the concern.

I'm not suggesting you change things here, just have seen stuff like this turn into many emptied bottles of whiskey.

When are you placing the become? I would assume this is something placed at the playbook level?

Ahhh yes, I had a become at the playbook level. I don't even remember adding that, it was in my initial commit from almost 2 years ago. Must have unknowingly copied it from some tutorial when I was starting out! I'll move that to only the modules that need it, since I definitely don't need to be using it everywhere. Thank you! I think that fully explains my issue I encountered now.

# ? Sep 13, 2022 07:01

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

prom candy posted:

I have this dream that I can take my k8s setup that I inherited and move everything in it to one of the PaaS providers. I am the sole developer in a company that has a monolith backend (plus a handful of lambdas) and also 4 major front end web apps and a native app. The bulk of our k8s setup is just the same Rails image but with different entry commands. A bunch of web servers, a huge fleet of resque (background job) workers, and then a couple of bespoke rake tasks. We do have a nextjs app running in there as well but I'd like to move that to Netlify or Vercel anyway.

I simply have too many job duties and I really don't care for devops relative to all the others and I'm wondering if there's any viability in moving my poo poo into a managed system like Railway, Render, or Fly.io. The guy before me put a lot of work into setting this up but it feels like we need to hire someone just to manage it and I'm not sure if we're serving enough traffic to warrant a full time devops person or if this whole stack is just overengineered because the last guy had k8s hype or wanted to get it on his resume. We do process a ton of data because we're a content sharing solution so we are non stop pulling in media files from Dropbox or other similar services in those background queues, plus doing a lot of other 24/7 background processing for other aspects of the system. Should I just push to hire somebody who's horny for AWS the way I'm horny for front end and let them manage all this or can PaaS scale up to "fairly well established startup crunching a ton of data" level and maybe do it cheaper than a hire?

Getting back to this because this is a scenario I am trying to avoid down the road for anyone that inherits any of my crap.

Let me be about as clear as possible - over engineering is Bad Engineering when you're a small team because you're not delivering stuff almost certainly or you're probably overcomplicating maintenance for yourself down the road. But when you're greatly limited on time your best bet is to time box efforts to migrate stuff over to make all the infrastructure stuff Someone Else's Problem with the understanding that you will still need to do some maintenance no matter what. For example, when it comes to maintaining dependencies in your container images you'lll probably want to have something like Renovate bot or dependabot to handle more groundwork. On the other hand, I've seen entire jobs where some poor sap had to spend nearly a year migrating off of a PasS because their costs were so high (Heroku bill of $25k+ / month) they could justify hiring the engineer solely to migrate away, and that's what I think is best to lean toward for most SaaS companies - use boring AF technology until it hurts and you know exactly what you need to quickly rule out tech stacks. Worked out well for the company I'm at and although they made a lot of mistakes over time they did manage to pick a stack that is quite serviceable for years to come.

# ? Sep 13, 2022 08:30

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

What's cool about the advances made in container orchestration and serverless tech over the past few years is that they contribute absolutely nothing to being able to just package and run an average web app in an operationalized way. These platforms have been outlandishly expensive and universally sucked for non-toy use cases since their inception. They don't seem to be making too many strides in a better direction, and the industry seems to be moving in the direction of making this kind of PaaS model irrelevant. Static hosting (and maybe some edge workers) is great if you can externalize everything except your frontend tier.

Generally, I think if an org already has the expertise to manage the infrastructure behind these kinds of apps, and they don't have trouble hiring that expertise when needed, these kinds of platforms are a waste of time. Updating the server image underneath your app once in awhile, in the year 2022, really isn't the huge operational expense people pretend it is. You still have to manage your app's internal dependencies. You still need to contend with it interacting with external dependencies.

Vulture Culture fucked around with this message at 15:23 on Sep 17, 2022

# ? Sep 17, 2022 15:15

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

necrobobsledder posted:

I've seen entire jobs where some poor sap had to spend nearly a year migrating off of a PasS because their costs were so high (Heroku bill of $25k+ / month) they could justify hiring the engineer solely to migrate away

Oh dear. This sounds like a very bad reason to hire an engineer. A senior engineer is what, $160k base? Then you have benefits making up another half to two-thirds of their salary, let's say $100k. Then they have a manager, and this engineer is taking up a fraction of their capacity, let's say 15% of the time and attention of someone who's also costing the company $260k a year in total compensation, so we're already at about $300k sunk in the first year. We'll ignore IT and technology spend (it's probably peanuts compared to compensation).

So at the end of that year, they've run up a $300k expense, and still haven't provided the company any cost savings, just potential future cost savings? They did that for a high-risk migration project that might fail entirely? And then at the end, the infrastructure isn't free, so even if you take a 50% cut on your hosting expenses, if those are fixed, you don't recoup that $300k investment for another two years, making their first year of salary a three-year sunk cost. This is stuff you can only rationalize if your infra spend is on some kind of exponential growth curve and you're looking at that $25k a month being $250k a month next year.

I wonder what percentage Heroku even represented of the overall operational cost of that app. Layer in log aggregation, APM, telemetry, analytics, etc. and I can't see the company walking away with more than a 25% drop in long-term operating cost off of that whole project.

# ? Sep 17, 2022 15:37

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Vulture Culture posted:

Oh dear. This sounds like a very bad reason to hire an engineer. A senior engineer is what, $160k base? Then you have benefits making up another half to two-thirds of their salary, let's say $100k. Then they have a manager, and this engineer is taking up a fraction of their capacity, let's say 15% of the time and attention of someone who's also costing the company $260k a year in total compensation, so we're already at about $300k sunk in the first year. We'll ignore IT and technology spend (it's probably peanuts compared to compensation).

Depends a lot upon the labor market. At the time it was more like $105k base, some fairly middling benefits for tech, and it's all well under $170k total cost to the company annually. I saw comp packages for co-founders then of base up to $140k via privileged sources along with different cap tables and such and I basically stopped thinking about starting a company in the area after that. And yeah, they were on track to hit well over $150k / mo in costs on a YoY basis as a fairly early stage start-up. Think they got the $25k down to < $5k / mo after about 6 months of work if I can remember the presentation right. I personally think they were going to go for a series A round and wanted to try to suppress the increasing infra costs to help bring in more investors and higher valuation. I know that in several of the acquisitions I've been a part of we were actively avoiding committing to any long term expenses like reserved instances and watching costs carefully.

# ? Sep 17, 2022 16:05

prom candy: Dec 16, 2005; Only I may dance

Heroku in particular has prices that scale up horribly don't they? I never got to that level with them but I've heard the higher tiers are brutal.

# ? Sep 19, 2022 06:03

luminalflux: May 27, 2005

anyone here going to argocon?

# ? Sep 19, 2022 06:14

The Iron Rose: May 12, 2012; Cat Army

These are all rough numbers - I�m getting a bit far afield of my non-compsci background so I figured I�d ask for some advice.

I need to distribute a ~55mb bloom filter encoded as a hex string to between several hundred and several thousand client cloud SIEM environments (think Azure Sentinel, Splunk Cloud, MDE, etc) over the public internet (unless azure lets me send data to a different tenant without going over the internet or peering VPCs, which I feel like should be a thing). Call it 100million database rows, using mariaDB�s rocksDB database engine (with which I�ve just learned exists today) to produce the filter. This updates whenever the table updates, but I don�t expect rows to change very often, so each iteration of the filter would I *think* be fairly similar to the previous. I need to distribute this to all the client environments at least once a minute. The client then runs the same hash function against an object in their environment to see if it�s (possibly) in our database or not.

So now the question is how do I efficiently distribute a fairly large blob file that changes semi-frequently. I figure we need to host the blob at edge and put it behind a CDN, but I�m not sure how much caching gets me if the source changes regularly. I don�t want the client to have to download a 50mb file every time either, so we need a client side cache. Append only doesn�t work because sometimes rows will change, and so will the computed hash, and thus so will the bloom filter.

My immediate thought was to divide my filter output into segments and check if each segment is identical in the client cache, which lets me balance compute vs network costs. And then I realized that at this point i lack the formal background to know if someone�s already solved this problem, and as it seems such a fundamental problem set, I�m sure someone already has.

E: using a more efficient probabilistic filter like a cuckoo or XOR filter seems to be the way to go here. More space efficient and we can distribute buckets rather than the whole data object. Bigger downside is outside of redis, there�s not a lot of data stores that implement the more efficient algorithms for you.

E2: I did some quick math on exactly how much volume the naive approach here would take and suddenly updating client environments once a day is okay lmao

The Iron Rose fucked around with this message at 03:58 on Sep 20, 2022

# ? Sep 20, 2022 02:57

Wizard of the Deep: Sep 25, 2005; Another productive workday

Comedy option: Route 53 DNS TXT entry with a TTL of 60 seconds.

(I don't have anything helpful. My brain just conjured up that mess and I thought I shouldn't be the only one to suffer)

# ? Sep 20, 2022 03:13

Methanar: Sep 26, 2013; by the sex ghost

The Iron Rose posted:

These are all rough numbers - I�m getting a bit far afield of my non-compsci background so I figured I�d ask for some advice.

I need to distribute a ~55mb bloom filter encoded as a hex string to between several hundred and several thousand client cloud SIEM environments (think Azure Sentinel, Splunk Cloud, MDE, etc) over the public internet (unless azure lets me send data to a different tenant without going over the internet or peering VPCs, which I feel like should be a thing). Call it 100million database rows, using mariaDB�s rocksDB database engine (with which I�ve just learned exists today) to produce the filter. This updates whenever the table updates, but I don�t expect rows to change very often, so each iteration of the filter would I *think* be fairly similar to the previous. I need to distribute this to all the client environments at least once a minute. The client then runs the same hash function against an object in their environment to see if it�s (possibly) in our database or not.

So now the question is how do I efficiently distribute a fairly large blob file that changes semi-frequently. I figure we need to host the blob at edge and put it behind a CDN, but I�m not sure how much caching gets me if the source changes regularly. I don�t want the client to have to download a 50mb file every time either, so we need a client side cache. Append only doesn�t work because sometimes rows will change, and so will the computed hash, and thus so will the bloom filter.

My immediate thought was to divide my filter output into segments and check if each segment is identical in the client cache, which lets me balance compute vs network costs. And then I realized that at this point i lack the formal background to know if someone�s already solved this problem, and as it seems such a fundamental problem set, I�m sure someone already has.

E: using a more efficient probabilistic filter like a cuckoo or XOR filter seems to be the way to go here. More space efficient and we can distribute buckets rather than the whole data object. Bigger downside is outside of redis, there�s not a lot of data stores that implement the more efficient algorithms for you.

E2: I did some quick math on exactly how much volume the naive approach here would take and suddenly updating client environments once a day is okay lmao

Distribute by bittorrent. Make all of the clients send the blob to other clients.

Call it originator-bandwidth-optimized distributed dynamic edge computing

Methanar fucked around with this message at 04:09 on Sep 20, 2022

# ? Sep 20, 2022 04:05

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Distributing a 50MB blob that might change anywhere inside it is tricky. Distributing 50MB of lots of little pieces of data that don't change very often is solved in certain situations. Like, if you could decompose the bloom filter blob into a data structure that fit into many reasonably-sized DB rows, then "all" you'd need to do is set up DB replicas. DBs are pretty good at pushing out changes to read-only replicas, and also recovering from the inevitable problems you'll get when the network falls over.

Edit: Torrents would be very efficient but I think they assume the data is static; every time you modified the blob you'd need a new torrent. If you could break the bloom filter up into smaller pieces, it could work.

# ? Sep 20, 2022 05:22

The Iron Rose: May 12, 2012; Cat Army

Methanar posted:

Distribute by bittorrent. Make all of the clients send the blob to other clients.

Call it originator-bandwidth-optimized distributed dynamic edge computing

minato posted:

Distributing a 50MB blob that might change anywhere inside it is tricky. Distributing 50MB of lots of little pieces of data that don't change very often is solved in certain situations. Like, if you could decompose the bloom filter blob into a data structure that fit into many reasonably-sized DB rows, then "all" you'd need to do is set up DB replicas. DBs are pretty good at pushing out changes to read-only replicas, and also recovering from the inevitable problems you'll get when the network falls over.

Edit: Torrents would be very efficient but I think they assume the data is static; every time you modified the blob you'd need a new torrent. If you could break the bloom filter up into smaller pieces, it could work.

I think there�s really no way around splitting the bloom filter into smaller segments if you want to do this sanely. Much more efficient to decompose the filter into 50 different 1mb files and do comparisons on the segments. Would drastically reduce data volume especially if you don�t expect the majority of those 1mb substrings to change frequently. Then on the client side cache the segment, set a lengthy ttl, hash the substring and compare the local segment hash against the published hash of the remote segment to see if we need to invalidate. Do all of that async from queries that engage with the filter, and either concat on query time or pre-compute and have some other means of refreshing the data.

Ultimately iI think this would significantly reduce the amount of data client environments need to download, allow us to update the bloom filter frequently, and also allows us to make much better use of CDN cacheing at edge to minimize our egress costs.

The Iron Rose fucked around with this message at 05:47 on Sep 20, 2022

# ? Sep 20, 2022 05:44

Trapick: Apr 17, 2006

How often do these clients need to check objects? Like is this a thousands of times a second thing or an every few minutes thing?

# ? Sep 20, 2022 06:13

MightyBigMinus: Jan 26, 2020

just put it behind a cdn and and use the purge function when it changes

it will cost you dollars in bandwidth. whole dollars!

# ? Sep 20, 2022 16:10

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

One other thing that's important is to understand how much control over the clients is happening. If you have a client that goes dark for, say, 1 year and comes back will that "torrent" still be available? What does phoning home mean? How is that secured in the future and how would you invalidate old key material and certs and so forth? Global CDNs are good for getting stuff out with the lowest latency and aggregate cost but for data that changes very often you're in the cache invalidation part of hard problems along with a variation of byzantine generals potentially if you ever have to rotate keys and URLs securely through distributed means. Note that with "SIEM" it's a security related issue that will require some auditing and non-repudiation sort of transactional logic.

We distribute our security model artifacts that are all smaller than 50 MB via our own weirdo, whacko audited-to-death CDN-lite system and have our own custom delta update algorithm to perform updates quickly and with as small diffs as possible (the client requests a version range and we get it topped up to a major version then a delta after that pre-calculated and cached on our backends for the combinations of requests possible). Our encoding is nothing all that special though (we rely more upon strong certificate chains along with defense-in-depth approaches to distrust the host client system as an assumption) and is designed around taking the plaintext artifact and making it easy to decompress, update, and invalidate quickly. I highly suggest splitting up the full artifact somehow to avoid needing to re-download the whole artifact every other time someone closes their laptop.

I'm thinking given it's essentially pulled from a big table you can use what amounts to the underlying DB's replication format (binary or logical isn't necessarily important). It's just copying and compressing the diffs over the wire, serializing to a file format, and completing the checkpoints as a transaction to be downloaded by clients.

The Iron Rose posted:

I think there�s really no way around splitting the bloom filter into smaller segments if you want to do this sanely. Much more efficient to decompose the filter into 50 different 1mb files and do comparisons on the segments. Would drastically reduce data volume especially if you don�t expect the majority of those 1mb substrings to change frequently.

Basically this. Even gzip is a stream-oriented envelope compression format and if you change data way down the line not all prior chunks will need to change. Look at the underlying data and use a serialization format more suitable for distribution. I mean heck, try anything from XDR to protocol buffers to jsonb to messagepack or whatever cool kids with lots of stars on Github are using these days.

# ? Sep 21, 2022 00:13

Junkiebev: Jan 18, 2002; Feel the progress.

Wizard of the Deep posted:

Comedy option: Route 53 DNS TXT entry with a TTL of 60 seconds.

(I don't have anything helpful. My brain just conjured up that mess and I thought I shouldn't be the only one to suffer)

add it to a FROM scratch docker image :unsmigghh:

Junkiebev fucked around with this message at 21:31 on Sep 21, 2022

# ? Sep 21, 2022 21:20

Junkiebev: Jan 18, 2002; Feel the progress.

MightyBigMinus posted:

just put it behind a cdn and and use the purge function when it changes

it will cost you dollars in bandwidth. whole dollars!

this is the real answer fyi

# ? Sep 21, 2022 22:48

Warbird: May 23, 2012; America's Favorite Dumbass

I've just gotten around to using cloud images for my proxmox homelab machine alongside Terraform to get stuff spun up and it's working quite well. The only thing that's bothering me at the moment is getting QEMU agents installed and working with the host. In theory this is pretty straightforward and the Pmox provider for Terraform even has a section in the code to enable support from the host's side of things.

In practice things get a bit problematic. The provider presumes that the agent software is already present on the VM being provisioned and will loop and time out waiting for a ping response that isn't coming. I was installing the agent and other bits via an ansible playbook as part of the provisioning, but that never happens due to the loop. The answer appears to make the agent part of the base image and clone from that, but I'm not exactly sure how to skin that cat.

Is there some way to add layers to a VM cloud image base without instantiating it and having it get assigned device ids and so on and so forth? Isn't that what Packer is all about?

edit:
After some more digging around I came across this blog post that mentions the usage of 'virt-customize' to layer packages in and I'm going to give that a shot. I still want/need to go mess with Packer at some point but one thing at a time.

Warbird fucked around with this message at 19:02 on Sep 26, 2022

# ? Sep 26, 2022 18:22

Junkiebev: Jan 18, 2002; Feel the progress.

Warbird posted:

Isn't that what Packer is all about?

packer spins up an ISO in an infrastructure provider, does stuff to it, including generally "generalizing" (sysprep, etc) it, and shits out a "templatized" image into the media library of the provider you chose. The thing you are trying to do is a perfect use-case assuming you are going to do it repeatedly.

# ? Sep 26, 2022 19:49

Warbird: May 23, 2012; America's Favorite Dumbass

That's what I figured. I doubt I'll be doing in with an real frequency, but it'll be a decent skillset to have on hand. For the record virt-customize did exactly what I wanted with no fuss or muss so it seems like an ideal solution for small scale and one off applications.

# ? Sep 26, 2022 21:08

Methanar: Sep 26, 2013; by the sex ghost

Does anyone have experience with using AMD based instances in aws?

I'm looking over the new c6 stuff and I'm thinking about the c6a class. AWS is advertising a 15% uplift over c5 for price:performance for both intel and amd respectively, but the amd stuff is about 10% cheaper from even that.
I'm looking over some public benchmarks comparing price:performance of intel vs amd now, but I'd be interested if anyone their own has anecdotes/success stories of looking at AMD's stuff.

# ? Sep 26, 2022 21:13

Hadlock: Nov 9, 2004

Is "Production Engineering" at Meta/Facebook basically devops/sre/platform engineer

What does the rest of FAANG call this position

Levels.fyi is showing abysmal pay for "production software engineer" like $160k TC for an SDE II production engineer at amazon

fake edit: oh looks like meta has an L3 at $194k TC? But only $130 base

# ? Sep 26, 2022 22:22

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Hadlock posted:

Is "Production Engineering" at Meta/Facebook basically devops/sre/platform engineer

What does the rest of FAANG call this position

PE @ Meta == SRE everywhere else. Facebook decided to call it something different because they refuse to follow Google's lead or even use any Google products internally, despite copy/pasting most of their culture from Google. I vaguely recall their rationale being along the lines of "Reliability implies that the engineer just cares about uptime, but really the scope is about engineering products to work at a production scale". Yeah whatever, it's just SRE.

# ? Sep 26, 2022 22:37

whats for dinner: Sep 25, 2006; IT TURN OUT METAL FOR DINNER!

Methanar posted:

Does anyone have experience with using AMD based instances in aws?

I'm looking over the new c6 stuff and I'm thinking about the c6a class. AWS is advertising a 15% uplift over c5 for price:performance for both intel and amd respectively, but the amd stuff is about 10% cheaper from even that.
I'm looking over some public benchmarks comparing price:performance of intel vs amd now, but I'd be interested if anyone their own has anecdotes/success stories of looking at AMD's stuff.

We switched all of our CI agents to this (GitLab runner using the Kubernetes executor) and build times didn't change up or down but we didn't really expect them to because they aren't compute bound. The real benefit for us was vastly increased spot availability and lower spot prices compared to the equiv intel instance types.

# ? Sep 26, 2022 23:36

Adbot: ADBOT LOVES YOU

# ? May 18, 2024 09:12

trem_two: Oct 22, 2002; it is better if you keep saying I'm fat, as I will continue to score goals; Fun Shoe

whats for dinner posted:

We switched all of our CI agents to this (GitLab runner using the Kubernetes executor) and build times didn't change up or down but we didn't really expect them to because they aren't compute bound. The real benefit for us was vastly increased spot availability and lower spot prices compared to the equiv intel instance types.

Same here basically. Switched all of our EKS nodes from intel instances to AMD and saved a good chunk of change, basically the same performance (some workloads did a bit better on the AMD nodes).

# ? Sep 27, 2022 04:18

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›156 »