Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

If you don't want to manage a Selenium server, you can use a 3rd party service like SauceLabs. Otherwise, WebDriver is where it's at.

# ¿ Mar 7, 2015 07:19

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 12:29

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

We had branch-per-configtype for a while (one for prod, dev, staging, test, etc), and it went really badly because it turns out that many changes frequently have to be made to multiple branches simultaneously (e.g. updating what SSL ciphers an Apache instance uses), and the branches drift apart from each other.

A more robust solution is to have code that generates the configs, taking the environment type as a parameter. e.g. "generate_config.sh --env=prod'. This keeps everything together in a single branch, and as a bonus it's easier to write tests that each env's config is correct.

# ¿ May 28, 2015 07:50

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

syphon posted:

Feature flags are great, but as the team/product gets larger they can turn into their own nightmare.

All true, but this problem is largely QA's. It's up to them to figure out a sane testing plan, and maybe convince Dev to decouple modules so the permutations aren't n-factorial.

Feature flags can really hit Devs when you have a framework that can enable feature flags for specific users, like those who might want to use the beta-version of a new feature, or when you're doing A/B testing on an sub-set of your customers. Because when a bug report comes in, you need to double-check what features they have turned on in order to replicate the behavior.

# ¿ Jun 3, 2015 01:29

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I'm a huge fan of Docker and we have a lot of stuff running in Docker in production, but it's mostly hand-tooled at the moment. The PaaS offerings for Docker are plentiful, but as you say, immature. Kubernetes and Mesos are your best bets for now, but they'll also require significant time investments. Mesos/Marathon is more of a SOA development framework than a PaaS.

If you're a relatively small shop without much experience then I'd recommend Dockerizing-all-the-things (just because it frees Ops from package-management hell) and deploy with hand-tooled puppet scripts, with a view to replacing those scripts over time with whatever Docker PaaS ends up winning the race.

# ¿ Jul 6, 2015 16:15

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Vulture Culture posted:

Anything but OpenStack. :ptsd:

Definitely. OpenStack is (mostly) fine to use, but only a masochist would want to manage it.

# ¿ Jul 6, 2015 19:22

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Gene Kim is a big DevOps proponent, and he wrote this book. Watch a couple of his talks (he's got loads online). He nails the philosophy, at least.

# ¿ Oct 13, 2015 05:15

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

If it all fits on one machine, just use shell scripts, i.e. a hard-coded startup order, hard-coded ports, etc. You don't need to overcomplicate things with service discovery or multi-host yet.

Data-only containers can be a bit of a pain, and always made me a bit nervous because if you nuke the last reference to the data, it's reclaimed by docker and removed. I'd avoid them entirely and just make a regular directory and bind-mount it inside the containers that need to access it.

# ¿ Oct 22, 2015 06:17

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Pollyanna posted:

found out that our project manager's laptop couldn't handle 64-bit VMs.

Did you check the bios settings? Usually it's just a matter of enabling "Virtualization technology" if it's an Intel chip, and then you can run 64-bit VMs. The laptop would have to be really old not to have that option.

# ¿ Dec 18, 2015 17:33

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I think it's totally possible for there to be a DevOps role, because someone has to ensure the teams are adhering to the DevOps principles. I don't think it's enough for the managers to walk in and pronounce that "We're doing DevOps now. Dev, talk to Ops more often, and vice-versa." Someone has to keep the ball from getting dropped. Are Devs just as much on the on-call hook as Ops when prod falls over? Is Ops getting invited to Dev's design/architecture meetings? Is there shared ownership of the build-test-deploy pipeline, and who is responsible for maintaining and developing that?

In the same way that there's a ScrumMaster who (amongst other things) is responsible for keeping the team aligned to Agile principles, there can be a "DevOpsMaster" who has a good understanding of the principles and the mandate to enforce them. It's not a given that this person would necessarily be a manager; in some companies, managers are more concerned with developing their reports' abilities than with deciding how they do their job.

# ¿ Jan 14, 2016 03:04

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

It's debatable. There's a tension between (say) security who wants the container to have the least stuff inside it to reduce the attack surface area, and devs (or SREs) who want to have some basic tooling in the container so they can debug stuff. We solved the issue by forcing devs to bind-mount in their debug tools into the dev containers, but the SREs were SOL because (at least at the time) there was no easy way to bind-mount debug tools into a running container.

By "environment" I assume you're talking about dev->test->stage->prod. It does make sense to have multiple build flavors for dev (add debug tools) and test (builds with full debug info, or built with address/thread sanitizer guard code). But there should also be a vanilla flavor that doesn't change from test->stage->prod. Otherwise, the reasoning goes, you're not testing exactly what you're deploying.

I've seen situations where the build process was re-done just for stg->prod, because (for security reasons) the build environments were more tightly locked down and free from external dependencies (like network access to Github). This also meant they were slower, and it was inefficient to put them at the front of the pipeline. So folks decided that since the only significant differences were factors that were unlikely to affect the build itself, it was deemed low-risk.

# ¿ Nov 21, 2018 19:20

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

We had a proxy that sent live traffic to the prod deployments and a copy to the candidate, and the candidate's responses were blackholed. Then we just watched error rates on the candidate to test its worthiness.

# ¿ Dec 13, 2018 16:28

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Mao Zedong Thot posted:

Plus you have to configure your kubernetes nodes with something.

The "immutable hosts" train of thought suggests that you configure it once on first boot with a tool like Ignition, and then you never touch it after that. If it's something like ContainerLinux then it'll auto-update itself with kernel upgrades. Any significant config change means nuking the cattle node and spinning up a new one. Which is totally fine if you've got a system like Kubernetes behind it to manage the rescheduling of workloads across nodes; not so much if you don't.

# ¿ Dec 18, 2018 19:03

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

We've used Kubernetes for years and haven't felt the need to automate anything with Ansible (or Chef, Puppet, etc). We use a combination of Jenkins to monitor our gitops repos that contain the kubernetes manifest files, which in turn triggers Helm/Tiller re-deployments. It works very well for 95% of the apps we run. We use AWS RDS for databases and EBS for persistent storage (which Kubernetes supports).

# ¿ Dec 19, 2018 01:52

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

12 rats tied together posted:

For your use case ansible is a drop-in replacement for Jenkins -- whether or not that's good for you pretty much just depends on how much you guys hate jenkins.

Jenkins for us is just an ersatz way of monitoring changes to a git repo, and also can act as "cron with a UI". It's not used as a build tool at all. For builds, we use Prow, which is effectively "what if Jenkins was built with kubernetes-native microservices".

# ¿ Dec 19, 2018 05:01

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

IIUYC, the storage is permanent but the DB process itself is ephemeral (i.e. not a long-running daemon that clients connect to)?

If that's the case, then SQLite works very well. The process running it can be terminated with almost a guarantee of no data loss or corruption.

Postgres in a container would probably work well too; like SQLite it uses WAL files so it's fairly resilient to being killed at random, but I suspect (without evidence) that SQLite might be better.

# ¿ Jan 2, 2019 19:26

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

OpenShift is to Kubernetes as Fedora is to the Linux kernel. It takes a tightly-focused software project and makes it usable by the masses. You can run vanilla Kubernetes by itself, but you'll need to add a bunch of stuff to turn it into a PaaS.

# ¿ Jan 17, 2019 03:25

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

There's a couple of options.

The easiest way is if you've got a docker registry running somewhere to store your containers. Then you can do something like:

code:

docker build -t my-registry.com/blah/mycontainer:latest .
docker push my-registry.com/blah/mycontainer:latest

then on the Linux box pull it from the registry:

code:

docker pull my-registry.com/blah/mycontainer:latest

If you don't have a registry, then build the image and save it to a tar file, then copy the tar file over and import it:

code:

docker build -t my-registry.com/blah/mycontainer:latest .
docker save -o somefile.tar push my-registry.com/blah/mycontainer:latest
scp somefile.tar your.linux.host:

then on the host:

code:

docker import somefile.tar my-registry.com/blah/mycontainer:latest

# ¿ Jan 20, 2019 09:15

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I think it comes down to splitting the update into 2 phases:
1) Ensure the desired state description and necessary assets are on the machine.
2) Initiate a server-local process to update the desired state from those local assets.

1 can be retried until successful with no side effects, which mitigates the issue of poor or intermittent connectivity. And 2 is likely to succeed because all the assets are in place and no connectivity is required.

# ¿ Mar 11, 2019 18:07

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

CoreOS Container Linux is probably not what you want. It's primarily a minimal OS to run docker containers and little else. The double-buffered root is for regular updates of the kernel + the minimal toolset, not the apps installed. And the cloud-config stuff was replaced by Ignition, which is really just a small tool for injecting the necessary custom systemd units and mounting volumes and so on.

# ¿ Mar 14, 2019 07:42

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Newf posted:

A suggestion I've come across is to use nginx as a 'reverse proxy' for all of these services.

I recommend Caddy Server instead of Nginx, because Caddy out-of-the-box automatically provisions and renews your HTTPS certificates via LetsEncrypt. (Nginx has a plugin to do it, but it's extra work to install. On the flipside, Caddy doesn't have any binary distributables so it must be built from Go source, but that's literally 2 commands).

Newf posted:

Vue-cli's deployment guide suggests nginx behind (inside of?) a docker container. Why would they suggest that instead of just giving instructions for nginx?

Because putting it into a container means that what you build on your laptop == what you deploy on the server, and it's also easier to install and run.

Newf posted:

Is it possible for me to included all three of my pieces inside a docker image for a one-line deployment?

Yes, but that's not best practice. Generally each docker container should only contain one component. That way you can upgrade/restart/scale up each component individually while not affecting other components.

# ¿ Apr 22, 2019 02:09

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Yeah I'd try to get this working without docker first. That solves a problem you don't have yet, and if you're not familiar with it then it's just gonna introduce more headaches.

# ¿ Apr 22, 2019 05:56

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

You'll need to run Caddy as root so it can bind to 80/443. It will an error if it can't bind to the ports or auto-provision a SSL certificate. You may need to open 80/443 on the host's firewall.

# ¿ Apr 22, 2019 06:43

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Methanar posted:

Don't do this. Give caddy the capability to bind to 443 instead

sudo setcap cap_net_bind_service+ep $(which caddy)

Yeah this is the Correct way to do things.

Newf posted:

fake edit3: please don't do stuff on the site - it isn't actually meant to be live right now, but I need to get things in order for a closed beta of sorts

Even if goons are nice and respect this, other people won't. Malicious folks are crawling the web all the time looking for poorly-secured installations to break into and leverage as bots or bitcoin miners or any manner of nefarious things. If you want things locked down while you're sorting things out, your best bet is to (temporarily) IP-filter traffic to your own IPs. You can do this at the firewall level, or if you compiled Caddy with the ipfilter plugin then just add a block to your Caddyfile:

code:

example.com {
  ipfilter / {
    rule allow
    ip 1.2.3.4/32
    ip 5.6.7.8/32
  }
}

For more advanced protection, you might want to put the site behind CloudFlare (which has a free tier). CloudFlare has a web-application-firewall feature to block a lot of malicious traffic. The idea is that your site's DNS points at the CloudFlare service, which is then configured to proxy safe traffic to your web host. You can then IP-restrict your webhost traffic so that it only accepts requests originating from CloudFlare. And I think you can go one better; CloudFlare has a feature (dunno if it's free) where they effectively set up a VPN between them and your webhost, so you never have to directly expose your webhost to the internet at all.

# ¿ Apr 22, 2019 15:45

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

To be fair, some of those filled significant gaps in k8s at the time, and they're being phased out as upstream k8s backfilled them.

# ¿ May 2, 2019 17:19

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

At one place I worked, they did "every 8 weeks you're on-call, even if you're a dev".

What they thought would happen:
- Quality would go up as devs got an appreciation for how their poor coding practices affected production. More tests, etc.
- Devs would have a better understanding of the difference between dev & production scale.
- Ops wouldn't feel quite so burdened by frequent on-call cycles, as devs would share the load.

What actually happened:
- Devs learned that their bugs could take down production, but they rarely felt the impact from their own bugs, so they didn't feel much need to improve testing.
- Devs learned next to nothing about production. They didn't want to, seeing it as just more distracting stuff they didn't need to do their primary job, and not what they signed up for. When it came their turn, they just continued work on their features and hoped they'd dodge a bullet that week. And they often did, because production problems were relatively rare.
- When something did go wrong, Ops got dragged in anyway because the devs were never familiar enough with the processes or systems to fix it themselves. Again; why bother deep-learning something that's fairly reliable and you only have to babysit it every 8 weeks?
- When the managers noticed all this reluctance to learn, they tried to fix it with "When you're on-call, your job that week is to improve the pipeline". But the pipeline was so complex that it took about 3-4 days to understand it, by which point, you've only got a couple of days left to do anything useful before you're off the hook again, so why bother?

These are all solvable of course. But yeah, naively implementing "devops" is not gonna magically fix all your problems.

# ¿ May 6, 2019 18:27

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

is there any specific reason not to use Redis? I'm looking into using Celery which seems to be the most popular Python dist-task queue package, and all the official docs and tutorials seem to recommend Redis.

# ¿ May 13, 2019 06:28

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

It turns out Celery supports Postgres (and Sqlite, Mysql, etc) as a backend, and my app is nowhere near ~~webscale~~ and already uses Postgres, so I'm just gonna use that. One less moving part to worry about.

# ¿ May 13, 2019 22:39

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Once a monorepo gets to a significant size, you need lots of extra support and tooling to use it effectively. You've just touched the tip of the iceberg. Google & Facebook have written some papers on their monorepos, it's worth a read about all the interesting effects of scale.

E.g. when a changeset lands, what is the dependency tree of build artifacts that need to be rebuilt? Solution: Google built Bazel for this reason, Facebook built Buck. (From experience, Bazel is a world of hurt - the intention and design is great, but the implementation and docs are just horrible)

E.g. What happens when your commit rate is really high?
- Effect 1: you can't land fast enough because the code is moving so fast. Solution: write a system that queues PRs and tries to automatically land them, using automatic rebasing/merging where possible. This will involve integration with whatever code review workflow you're currently using, since devs may need to get involved if a rebase can't be done automatically.
- Effect 2: your build systems can't keep up with the commit rate. Solution: batch (say) 20 PRs together into a single, and if the build passes all tests then you can rubber stamp them collectively. Otherwise automatically bisect to find the problematic commit(s).

E.g. What happens when your monorepo gets really big and pulls/clones take forever? Solution: Google implemented a goddamn virtual filesystem that would only pull files on demand. Facebook implemented a sparse-checkout system in Mercurial where the developer would clone via a "profile" that indicated the subset of files to actually clone.

If you're a relatively small shop, then you've got to cap the size on your monorepos because you won't have the tooling and support to do all the above.

# ¿ May 31, 2019 22:49

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

To answer your specific question: to determine what builds need to be made based on what files changed, you can either come up with some simple directory convention (e.g, a change under /foo means rebuild Foo, and a change under /shared means rebuild everything), or use something like Bazel/Buck which will tell you exactly what needs to be rebuilt because it accounts for dependencies for every file in the repo. I would try the former before the latter... it's an expensive investment.

Another scaling problem I forgot to mention: access control. Either all devs can see everything, or you've got to have automation that rejects PRs from folks trying to touch files they don't have permission to modify, or if there are files that some devs shouldn't even see then you need a bonkers sparse-filesystem solution that prevents them from cloning them at all. Fun!

# ¿ May 31, 2019 23:33

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Wild guess:
1) Set up a Squid proxy
2) Use the HTTP{S}_PROXY envvars when running dockerd to point it at the the Squid proxy (export HTTPS_PROXY=http://localhost:squidport). This means any HTTP operations like docker pull should be routed through the Squid proxy.
3) Configure Squid to block any attempt to retrieve from registry-1.docker.io, and/or any domain that looks like an IPv4/v6 address.

# ¿ Jun 4, 2019 00:54

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

If your k8s cluster is used as a build farm, then an admission controller won't prevent someone from doing a docker build with a FROM pointing to an undesired registry.

# ¿ Jun 4, 2019 01:17

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

And presumably, you can't just do exactly what you want in AWS? Use Terraform and/or CloudFormation to spin up and tear down the VMs?

If the GUI testing is a web interface, then you can do headless testing via Selenium/Chromium, so no VM required.

# ¿ Jun 7, 2019 21:18

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Yeah, it's really hard to do. It helps if your apps have no state beyond what's in the database and can be quickly pointed at an arbitrary database.

Canarying really needs to be architected in from the beginning. It's only easy to add on after the fact for trivial apps.

# ¿ Jul 17, 2019 17:18

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

beuges posted:

Issue 1: code ignitor stores stuff like database credentials in files like application/config/database.php. The docker image I've created has a database.php inside it which connects to the db in the test environment, and it works. But I'd obviously want to be able to deploy the same docker image into the prod environment once we're ready to migrate over. What's the best way to inject settings into a container? Do I set environment variables and then run scripts via Dockerfile that copy the environment variables out into the proper places, or is there a better way to do this?

You can either:
- pass in the credentials with environment variables, and change your database.php to be like "$dbuser = getenv('DATABASE_USER')", etc. This is easy to do but has the disadvantage that your credentials are easier to find for some hacker on the box.
- Put the creds into a file outside the container, and bindmount that file into the container (docker run... -v /path/to/file/on/host:/path/to/file/in/container ...) and again, update your database.php to read credentials from that file (or even simpler, just bindmount the database.php file itself). Slightly safer than the envvars method, as long as you protect the credentials file on the host.
- Use a vault or other type of credentials server, and then at container startup time your app will reach out to the server and ask for the credentials over the network. This is safest because the credentials only exist in the app's memory, but it's also way more hassle to setup.

# ¿ Jul 25, 2019 23:18

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Maybe file permissions? Some k8s clusters are setup to randomize the user/group that the container starts with.

# ¿ Nov 13, 2019 08:13

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

If it's mostly static content then just throw it in a storage bucket to avoid the web server running at all, and use Cloud Run to initiate the occasional compute.

Only pitfall I can think of is to make sure you use a robots.txt to ensure your static site isn't crawled as it will consume bandwidth you don't need. And maybe set up a Budget Alert so you're aware if you start to approach the free tier threshold.

# ¿ Nov 2, 2020 19:37

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

As someone who uses All The Clouds, the big 3 mostly have feature parity and come close in costs. However if you're StackOverflowing you'll likely find more AWS solutions than Azure/GCP, but GCP also has pretty good docs so you might not need StackOverflow/tutorials anyway.

AWS does have a budget alert feature under Services --> Billing --> Budgets --> Create Budget. There's also some CloudWatch stuff you can enable to alert you if you go over some bandwidth threshold.

# ¿ Nov 2, 2020 22:38

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Teleport? https://goteleport.com/teleport/

# ¿ Nov 19, 2020 03:53

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

Latest OpenShift has a bare metal install option too.

# ¿ Dec 2, 2020 00:47

Adbot: ADBOT LOVES YOU

# ¿ Apr 29, 2024 12:29

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

If you must do least privilege, then you can "easily" calculate the required perms in the same way people do it for SELinux: Run Terraform once with full admin privs, then scrape CloudTrail logs with Athena to get a unique list of all the API calls it made, and throw all of them into a custom role that Terraform can use.

# ¿ Jan 13, 2021 22:35

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›3 »