Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›157 »

Methanar: Sep 26, 2013; by the sex ghost

New Yorp New Yorp posted:

That's a very anti-devops attitude that I disagree with in every way. We should be trying to eliminate silos, not saying "gently caress it, that's an ops problem. I threw my code over the wall to them, let them figure out what to do with it now".

Despite the phrasing, I don't think that's quite what was meant.

When you're rewriting your auth layer to use some modern mechanism, you're testing it locally. It's just that it's irrelevant during development as to whether your binary is ultimately copied around for release to prod by docker pull, apt-get, git clone, or baked into an AMI.

If your app actually cares whether or not its running in a chroot on overlayfs, in a separate process namespace, with cfs quotas, I have some questions.

Methanar fucked around with this message at 21:53 on Mar 27, 2021

# ? Mar 27, 2021 21:47

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 21:18

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

The biggest problem with the "throw it over the wall" approach is the one-way flow, not the fact that there's some division of labor. I don't have any clue how the server-side stuff I sometimes touch is actually deployed, but that doesn't mean I just do whatever I feel like and make them deal with it. It's the exact opposite, in fact. They tell me what they need for the prod setup and I build that.

# ? Mar 28, 2021 01:27

The NPC: Nov 21, 2010

Thanks for the replies everyone. Gonna try to not get too e/n here but there is history.

In the last couple of years, Dev has been given a lot of leeway to go out and try new things. This has lead to some wins for our developers as they have been able to build greenfield services and show management what is possible/what they are capable of. On the other hand, we now have "proof of concept" environments running prod services in multiple clouds.

Ops is trying to get a handle on what is where and how can we support it. We have had big wins with modernizing build and release processes for Dev teams who deploy to existing Ops managed infrastructure. The Dev owned cloud stuff running k8s is all over the place and is getting too big for them to manage on their own. Also, they are now spending time janitoring instead of solving business problems. We are hoping to be able to come in, standardize their infrastructure and support them so when they get a call at 3 am it's for a legitimate application issue and not that they hit the quota on whatever personal subscription they are billed under.

Part of what we have to figure out is what can evolve with the platform and what is set in stone. It sounds like anything tied to windows identity is going to be harder to move, and that's fine. Better to know now than to waste a few weeks bashing my head against a wall. Greenfield stuff is supposed to going to k8s as much as possible.

# ? Mar 28, 2021 01:41

12 rats tied together: Sep 7, 2006

Plorkyeran posted:

The biggest problem with the "throw it over the wall" approach is the one-way flow, not the fact that there's some division of labor.

This is a better way to phrase where I was going. Dividing attention to problems along subject matter expertise and areas of focus is simply good development practice. The individuals best equipped to determine whether something should run on native, virtual, openshift, k8s, vendor managed k8s, or vendor managed abstracted k8s (EKS vs ECS, for example) are ops. They might not be called "ops" in whatever hypothetical org we're talking about, but I'm going to keep using "ops" as shorthand for it.

Development's input on this should be requirements based -- the application needs access to object storage, application roles of this type need access to local disks for speed, we expect disk bandwidth to max out at around this rate, things of this nature.

# ? Mar 28, 2021 18:47

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

12 rats tied together posted:

Development's input on this should be requirements based -- the application needs access to object storage, application roles of this type need access to local disks for speed, we expect disk bandwidth to max out at around this rate, things of this nature.

Hard disagree. The dev team should own their infra and as such the choice should be theirs. Of course ops should lay out the decision so that dev can make an informed choice, but if ops makes that decision for them the operational burden is completely on ops rather than a collaboration and now we're back to "throw it over the wall to ops".

# ? Mar 28, 2021 21:41

Methanar: Sep 26, 2013; by the sex ghost

Blinkz0rz posted:

Hard disagree. The dev team should own their infra and as such the choice should be theirs. Of course ops should lay out the decision so that dev can make an informed choice, but if ops makes that decision for them the operational burden is completely on ops rather than a collaboration and now we're back to "throw it over the wall to ops".

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves. Even stuff like using a standardized instance type so we can properly spec and plan for RI pricing and prepurchasing instances up front is a major costs savings that we are only achieving by consolidating that ownership over to ops.

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

Methanar fucked around with this message at 22:03 on Mar 28, 2021

# ? Mar 28, 2021 21:59

12 rats tied together: Sep 7, 2006

Methanar posted:

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves.

I'm willing to entertain the idea that the opposite is technically possible, but I have yet to experience an org where this isn't the case. You hire ops people because they have expertise, everything that could possibly benefit from that expertise should flow through it, even if that might superficially resemble a silo in some circumstances.

# ? Mar 28, 2021 22:13

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Methanar posted:

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

This 100%. Am dev in a small formerly-on-prem software company that started getting into cloud services last year and doesn't have a dedicated ops team yet (we hired the first guy a few months ago). All our cloud resource are massively oversized because when the CEO asked me what sort of infra we needed, I basically told him that it would be a full time job to monitor loads, organize proper stress tests, compare vertical vs. horizontal scaling, etc. so all I could offer was a glorified guess. His response was 'Ok let's just pick the options that allows us to scale to the maximum possible size if it is ever needed'. This is why we're running a Citus-enabled server even though all our postgres databases fit in RAM, and I sure as hell am not going to complain about padding my resume like that.

# ? Mar 28, 2021 23:08

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Methanar posted:

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves. Even stuff like using a standardized instance type so we can properly spec and plan for RI pricing and prepurchasing instances up front is a major costs savings that we are only achieving by consolidating that ownership over to ops.

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

But now you're not scaling operations. You're spending so much time on the stuff that devs should be doing that you don't have the ability to look forward unless your team grows significantly.

I don't disagree that infra primitives (i.e. instance types for RIs, iops provisioning, autoscaling patterns, VPC networking, etc.) should be codified and automated in whatever way possible but mandated infra or "give me your app and I'll run it" is a huge step back in terms of process and culture and just leads to more ops burnout and misalignment with dev.

We've had good luck with a continuous cost savings project that goes through every dev team's infra and figures out what's over provisioned or needs some extra attention and lets the team prioritize accordingly. It's helped dramatically in lowering costs as well as aligning with dev on a shared goal of making their work more cost conscious.

# ? Mar 29, 2021 00:09

Methanar: Sep 26, 2013; by the sex ghost

Blinkz0rz posted:

But now you're not scaling operations. You're spending so much time on the stuff that devs should be doing that you don't have the ability to look forward unless your team grows significantly.

I don't know what scaling operations means if not to reduce the cost it takes to run the platform. I've easily reduced annual spend by my salary in each of the last 2 years.

I don't own anybody's app. I just am responsible for providing kubernetes, the ecosystem around kubernetes, and making sure everybody is using kubernetes properly. Kubernetes is the platform and mechanism through which ops is taking ownership of the infra what was previously yolo mode.

# ? Mar 29, 2021 01:22

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Blinkz0rz posted:

We've had good luck with a continuous cost savings project that goes through every dev team's infra and figures out what's over provisioned or needs some extra attention and lets the team prioritize accordingly. It's helped dramatically in lowering costs as well as aligning with dev on a shared goal of making their work more cost conscious.

I really have no idea why you think that having devs make infra decisions is necessary or even useful for that. I�ve had ops come to me and say �hey we�re spending a fuckload on high durability storage for this thing you wrote�. We had a short meeting about it, came up with a feature that let us store most of the data in a cheaper way, and shipped it. At some point I�m sure they told me the names of the different storage types and why there was a big price difference, but it would have been a pretty stupid waste of time for me to go learn everything I needed to know to pick the appropriate ways to store things rather than just trusting my colleague who specializes in that.

# ? Mar 29, 2021 02:21

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Plorkyeran posted:

I really have no idea why you think that having devs make infra decisions is necessary or even useful for that. I�ve had ops come to me and say �hey we�re spending a fuckload on high durability storage for this thing you wrote�. We had a short meeting about it, came up with a feature that let us store most of the data in a cheaper way, and shipped it. At some point I�m sure they told me the names of the different storage types and why there was a big price difference, but it would have been a pretty stupid waste of time for me to go learn everything I needed to know to pick the appropriate ways to store things rather than just trusting my colleague who specializes in that.

To be clear, what I wrote earlier is that devs should own their infra. I'm not suggesting that devs make decisions in a vacuum and expect ops to support it. That's insane.

# ? Mar 29, 2021 02:37

Methanar: Sep 26, 2013; by the sex ghost

# ? Mar 29, 2021 18:51

12 rats tied together: Sep 7, 2006

Any pulumi touchers in the thread have experience with the automation api? I'm considering adopting it at currentjob for a couple of use cases, but mostly as a drop-in replacement for cloudformation/terraform/arm/etc inside of a more strictly orchestrated workflow system (tldr: there are no people running shell commands).

# ? Apr 7, 2021 00:13

Methanar: Sep 26, 2013; by the sex ghost

Has anybody tried sharding a single kubernetes cluster across multiple physical datacenters? In reality there would be two k8s (for mutually exclusive sets of microservices), but each sharded across the DCs in the same manner.

We have a bunch of datacenters, each 'region' is a group of 3 DCs with around 3-4ms of latency between each of them and reliable interconnects. They're basically direct analogues of AZs. Somebody tell me why its a bad idea to have a single kubernetes cluster spread across 3 DCs. I can't find anything that would suggest 3-4ms rtt is a deal breaker for etcd.

The alternative is one k8s per DC and then we do gross pseudo-federation at the Spinnaker layer or something that I don't like. We already have an internal federated service controller and we'd probably need more of it.
https://kubernetes.io/docs/concepts...pdate%20access.
https://kubernetes.io/docs/tasks/administer-cluster/developing-cloud-controller-manager/

In each DC we have a smallish VMware footprint. My current idea is the control plane is ran on VMware to assist with bootstrapping, use of sharded local storage and general overall resiliency. Then the workers would be bare metal machines in the DCs (bootstrapped with MAAS).

Mostly unrelated but we're looking at probably using Kube-router as our CNI with our own ASN that we peer with the main DC network.

Methanar fucked around with this message at 01:18 on Apr 7, 2021

# ? Apr 7, 2021 01:15

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

I think the search term you're looking for is a Multi-AZ or "stretch" cluster. Googling for that term brought up some OpenShift-related blog posts, maybe that's just Red Hat's term for it.

# ? Apr 7, 2021 02:46

my homie dhall: Dec 9, 2010; honey, oh please, it's just a machine

Methanar posted:

Mostly unrelated but we're looking at probably using Kube-router as our CNI with our own ASN that we peer with the main DC network.

If your AZs are correspond to eg different subnets you will need to use Calico or something else that gives you control over the IPAM if I'm assuming correctly that you're going to be advertising pod IPs. Do you have a specific reason for advertising them? In general advertising pod IPs is a PITA and the overhead of encapsulation + SNAT has been negligible for the size of clusters I've worked on. Another thing to consider is that if you ever want to grow your IP range it may be more difficult with kube-router than with Calico, with Calico you just add another pool and you can delete pools that are empty. No experience with kube-router, but for us Calico has been a pretty positive experience.

For somewhat similar reasons I would advise not advertising service IPs and instead getting a loadbalancer implementation.

As long as you have >= 3 AZs per DC though having etcd span the AZs makes sense. Where you'd want a separate cluster per AZ is if you only had two or something

# ? Apr 7, 2021 02:49

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Huawei also did some work in this direction, so they might have some idea of what it looks like.

10ms is the magic barrier that zk/mesos used to break on me, but I never dug deep to figure out why.

I can also say that k3s Kubernetes API running on Postgres couldn�t deal 10ms of latency and would basically just die.

# ? Apr 7, 2021 03:10

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

Methanar posted:

Has anybody tried sharding a single kubernetes cluster across multiple physical datacenters? In reality there would be two k8s (for mutually exclusive sets of microservices), but each sharded across the DCs in the same manner.

I support a pretty large number of on-premises Kubernetes clusters (50+) and some of them are stretch clusters that span data centers. I don't like the pattern and I try to discourage its use because it misleads people into thinking they are now redundant when in fact they are not. You are protected against one of your 3 DCs going offline, which is probably not a common occurrence, but you are not protected against etcd crapping itself, Kubernetes crapping itself, Istio crapping itself if you are _unfortunate enough to be playing the service mesh game, deploying broken software that you have to back out, or a multitude of other problems that are a lot more common in such an environment. Stretch clusters provide an insidious illusion of redundancy and a modicum of convenience, but it's not redundant in the same way that RAID is not a backup. I agree that federation is annoying to do, but my view is that I would rather endure the annoyance of having to deploy 2-3 mirror clusters in different DCs and maintain some load balancer strategy across them in exchange for not being caught out badly when the poo poo hits the fan and you're frantically scrambling to git revert and redeploy the old version of whatever piece of software just exploded or you've got a shoulder-length glove on and are reaching up etcd's butt to try to get your single golden Kubernetes cluster off the skids.

# ? Apr 7, 2021 03:30

Newf: Feb 14, 2006; I appreciate hacky sack on a much deeper level than you.

Devops thread because people here probably have secrets management experience.

Anyone know where I should be looking if I'm in the market for a HSM (hardware security module) that:

- stores (locally) a private key
- performs (locally) a keccak256 hash of input data using the PK as a salt (ie, signed hash)
- returns the hash

I can't seem to find the right search terms to get relevant results.

# ? Apr 9, 2021 13:20

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

Newf posted:

Devops thread because people here probably have secrets management experience.

Anyone know where I should be looking if I'm in the market for a HSM (hardware security module) that:

- stores (locally) a private key
- performs (locally) a keccak256 hash of input data using the PK as a salt (ie, signed hash)
- returns the hash

I can't seem to find the right search terms to get relevant results.

I think if you�re paying for a HSM territory, you�re in paying somebody to set it up for you who knows how to avoid insecure configurations who will know the answer to that question.

# ? Apr 9, 2021 13:59

LochNessMonster: Feb 3, 2005; I need about three fitty

PCjr sidecar posted:

I think if you�re paying for a HSM territory, you�re in paying somebody to set it up for you who knows how to avoid insecure configurations who will know the answer to that question.

Be prepared to pay out of your rear end for this.

Depending on what the keys are used for you might need to take care of a shitload of additional security controls. Things like, "every change to the machines need to be done under supervision of an (external) auditor, make sure nobody can physically access the machines, meaning you need to get a seperate room/cage inside the DC with mantraps/biometric authentication, etc.

# ? Apr 9, 2021 14:44

Newf: Feb 14, 2006; I appreciate hacky sack on a much deeper level than you.

Maybe I'll back up a little to avoid X/Y problems.

What I want is to perform salted, signed hashes of some data, without my PK ever existing either in storage or in memory on my server.

I guess plugging a HSM into a machine which is programmatically accessible by the machine then leads to the problem of anyone with remote access to the machine also, by extension, having remote access to the signing service, which is roughly as bad as having access to the PK itself.

Man, computers!

# ? Apr 9, 2021 16:13

LochNessMonster: Feb 3, 2005; I need about three fitty

Newf posted:

Maybe I'll back up a little to avoid X/Y problems.

What I want is to perform salted, signed hashes of some data, without my PK ever existing either in storage or in memory on my server.

I guess plugging a HSM into a machine which is programmatically accessible by the machine then leads to the problem of anyone with remote access to the machine also, by extension, having remote access to the signing service, which is roughly as bad as having access to the PK itself.

Man, computers!

Doing PKI securely is a lot of work, really expensive and insanely time consuming. It�s pretty difficult to make any tradeoffs without sacrificing the requirements to lead you down the road of owning a physical HSM.

But it seems you�re already aware of the results of making such decisions so I�d say you�re well ahead of the (non-elliptic :p) curve.

# ? Apr 9, 2021 20:09

Hadlock: Nov 9, 2004

Man, gently caress you if you generate a password/api key with a # in it and then send it to a paying customer

# ? Apr 9, 2021 21:36

Methanar: Sep 26, 2013; by the sex ghost

Hadlock posted:

Man, gently caress you if you generate a password/api key with a # in it and then send it to a paying customer

What's wrong with #

# ? Apr 9, 2021 21:55

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Methanar posted:

What's wrong with #

Yaml?

# ? Apr 9, 2021 22:36

The Fool: Oct 16, 2003

Why the gently caress are you putting secrets in yaml

# ? Apr 9, 2021 23:00

fletcher: Jun 27, 2003; ken park is my favorite movie; Cybernetic Crumb

The Fool posted:

Why the gently caress are you putting secrets in yaml

# ? Apr 9, 2021 23:28

xzzy: Mar 5, 2009

I manage systems with several keys in yaml files. But they're encrypted with a private key that is totally not in a yaml file so it's absolutely okay and this will never backfire.

# ? Apr 9, 2021 23:37

in a well actually: Jan 26, 2011; dude, you gotta end it on the rhyme

eyaml, lol

# ? Apr 9, 2021 23:57

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Clearly we should just go back to JSON and XML

# ? Apr 10, 2021 00:22

Plorkyeran: Mar 22, 2007; To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

You are allowed to put quotes around your strings in yaml, you know. Just because it lets you do stupid things doesn't mean you have to.

# ? Apr 10, 2021 03:39

Vanadium: Jan 8, 2005

Man, gently caress you if you generate a password/api key with a " in it and then send it to a paying customer

# ? Apr 10, 2021 12:22

Vanadium: Jan 8, 2005

When I was newer at this job I ended up regenerating some AWS credentials repeatedly until I got a secret key that didn't have any punctuation that got mangled by whatever misconfigured AWS SDK this tool was using.

# ? Apr 10, 2021 12:23

vanity slug: Jul 20, 2010

gently caress you if your generated password isn't eight digits

# ? Apr 10, 2021 12:36

12 rats tied together: Sep 7, 2006

I have never encountered a password generated from aws or any other cloud vendor that has been even a little bit difficult to store in version control.

yaml can contain spaces, every type of quote and bracket, the hyphen, pound signs, you can even shove emoji or some unicode bullshit in there. It also has type tags, including a type for an ordered map, which is not something I would expect from such a terse format.

If you use it at work, you should spend some time reading the spec.

# ? Apr 10, 2021 17:20

Vanadium: Jan 8, 2005

I don't need to read the spec or remember anything about it because the first google result for yaml multiline strings is really handy.

# ? Apr 10, 2021 19:05

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Vanadium posted:

I don't need to read the spec or remember anything about it because the first google result for yaml multiline strings is really handy.

I ended up memorizing everything about this because it's ended up making my rat's nest of Ansible Jinja2 an order of magnitude more readable

# ? Apr 10, 2021 19:18

Adbot: ADBOT LOVES YOU

# ? May 31, 2024 21:18

Blinkz0rz: May 27, 2001; MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Just remember to always quote anything that could be interpreted as a hex encoded value because yaml gets really excited about converting it quietly!

# ? Apr 10, 2021 20:56

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›157 »