Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Methanar
Sep 26, 2013

by the sex ghost

New Yorp New Yorp posted:

That's a very anti-devops attitude that I disagree with in every way. We should be trying to eliminate silos, not saying "gently caress it, that's an ops problem. I threw my code over the wall to them, let them figure out what to do with it now".

Despite the phrasing, I don't think that's quite what was meant.

When you're rewriting your auth layer to use some modern mechanism, you're testing it locally. It's just that it's irrelevant during development as to whether your binary is ultimately copied around for release to prod by docker pull, apt-get, git clone, or baked into an AMI.

If your app actually cares whether or not its running in a chroot on overlayfs, in a separate process namespace, with cfs quotas, I have some questions.

Methanar fucked around with this message at 21:53 on Mar 27, 2021

Adbot
ADBOT LOVES YOU

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
The biggest problem with the "throw it over the wall" approach is the one-way flow, not the fact that there's some division of labor. I don't have any clue how the server-side stuff I sometimes touch is actually deployed, but that doesn't mean I just do whatever I feel like and make them deal with it. It's the exact opposite, in fact. They tell me what they need for the prod setup and I build that.

The NPC
Nov 21, 2010


Thanks for the replies everyone. Gonna try to not get too e/n here but there is history.

In the last couple of years, Dev has been given a lot of leeway to go out and try new things. This has lead to some wins for our developers as they have been able to build greenfield services and show management what is possible/what they are capable of. On the other hand, we now have "proof of concept" environments running prod services in multiple clouds.

Ops is trying to get a handle on what is where and how can we support it. We have had big wins with modernizing build and release processes for Dev teams who deploy to existing Ops managed infrastructure. The Dev owned cloud stuff running k8s is all over the place and is getting too big for them to manage on their own. Also, they are now spending time janitoring instead of solving business problems. We are hoping to be able to come in, standardize their infrastructure and support them so when they get a call at 3 am it's for a legitimate application issue and not that they hit the quota on whatever personal subscription they are billed under.

Part of what we have to figure out is what can evolve with the platform and what is set in stone. It sounds like anything tied to windows identity is going to be harder to move, and that's fine. Better to know now than to waste a few weeks bashing my head against a wall. Greenfield stuff is supposed to going to k8s as much as possible.

12 rats tied together
Sep 7, 2006

Plorkyeran posted:

The biggest problem with the "throw it over the wall" approach is the one-way flow, not the fact that there's some division of labor.

This is a better way to phrase where I was going. Dividing attention to problems along subject matter expertise and areas of focus is simply good development practice. The individuals best equipped to determine whether something should run on native, virtual, openshift, k8s, vendor managed k8s, or vendor managed abstracted k8s (EKS vs ECS, for example) are ops. They might not be called "ops" in whatever hypothetical org we're talking about, but I'm going to keep using "ops" as shorthand for it.

Development's input on this should be requirements based -- the application needs access to object storage, application roles of this type need access to local disks for speed, we expect disk bandwidth to max out at around this rate, things of this nature.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

12 rats tied together posted:

Development's input on this should be requirements based -- the application needs access to object storage, application roles of this type need access to local disks for speed, we expect disk bandwidth to max out at around this rate, things of this nature.

Hard disagree. The dev team should own their infra and as such the choice should be theirs. Of course ops should lay out the decision so that dev can make an informed choice, but if ops makes that decision for them the operational burden is completely on ops rather than a collaboration and now we're back to "throw it over the wall to ops".

Methanar
Sep 26, 2013

by the sex ghost

Blinkz0rz posted:

Hard disagree. The dev team should own their infra and as such the choice should be theirs. Of course ops should lay out the decision so that dev can make an informed choice, but if ops makes that decision for them the operational burden is completely on ops rather than a collaboration and now we're back to "throw it over the wall to ops".

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves. Even stuff like using a standardized instance type so we can properly spec and plan for RI pricing and prepurchasing instances up front is a major costs savings that we are only achieving by consolidating that ownership over to ops.

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

Methanar fucked around with this message at 22:03 on Mar 28, 2021

12 rats tied together
Sep 7, 2006

Methanar posted:

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves.

I'm willing to entertain the idea that the opposite is technically possible, but I have yet to experience an org where this isn't the case. You hire ops people because they have expertise, everything that could possibly benefit from that expertise should flow through it, even if that might superficially resemble a silo in some circumstances.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Methanar posted:

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

This 100%. Am dev in a small formerly-on-prem software company that started getting into cloud services last year and doesn't have a dedicated ops team yet (we hired the first guy a few months ago). All our cloud resource are massively oversized because when the CEO asked me what sort of infra we needed, I basically told him that it would be a full time job to monitor loads, organize proper stress tests, compare vertical vs. horizontal scaling, etc. so all I could offer was a glorified guess. His response was 'Ok let's just pick the options that allows us to scale to the maximum possible size if it is ever needed'. This is why we're running a Citus-enabled server even though all our postgres databases fit in RAM, and I sure as hell am not going to complain about padding my resume like that.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Methanar posted:

I spend a significant chunk of my time moving devs away from owning their own infra precisely because they don't have the time to do it properly themselves. Even stuff like using a standardized instance type so we can properly spec and plan for RI pricing and prepurchasing instances up front is a major costs savings that we are only achieving by consolidating that ownership over to ops.

Spot instances, autoscaling, reducing cross AZ traffic, using right-sized EBS volumes in terms of IOPs are other big ones too. What happens when you give devs the responsibility to self service all these things in a fast growing company they just massively oversize everything so they don't need to think about it.

But now you're not scaling operations. You're spending so much time on the stuff that devs should be doing that you don't have the ability to look forward unless your team grows significantly.

I don't disagree that infra primitives (i.e. instance types for RIs, iops provisioning, autoscaling patterns, VPC networking, etc.) should be codified and automated in whatever way possible but mandated infra or "give me your app and I'll run it" is a huge step back in terms of process and culture and just leads to more ops burnout and misalignment with dev.

We've had good luck with a continuous cost savings project that goes through every dev team's infra and figures out what's over provisioned or needs some extra attention and lets the team prioritize accordingly. It's helped dramatically in lowering costs as well as aligning with dev on a shared goal of making their work more cost conscious.

Methanar
Sep 26, 2013

by the sex ghost

Blinkz0rz posted:

But now you're not scaling operations. You're spending so much time on the stuff that devs should be doing that you don't have the ability to look forward unless your team grows significantly.

I don't know what scaling operations means if not to reduce the cost it takes to run the platform. I've easily reduced annual spend by my salary in each of the last 2 years.

I don't own anybody's app. I just am responsible for providing kubernetes, the ecosystem around kubernetes, and making sure everybody is using kubernetes properly. Kubernetes is the platform and mechanism through which ops is taking ownership of the infra what was previously yolo mode.

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed

Blinkz0rz posted:

We've had good luck with a continuous cost savings project that goes through every dev team's infra and figures out what's over provisioned or needs some extra attention and lets the team prioritize accordingly. It's helped dramatically in lowering costs as well as aligning with dev on a shared goal of making their work more cost conscious.

I really have no idea why you think that having devs make infra decisions is necessary or even useful for that. I’ve had ops come to me and say “hey we’re spending a fuckload on high durability storage for this thing you wrote”. We had a short meeting about it, came up with a feature that let us store most of the data in a cheaper way, and shipped it. At some point I’m sure they told me the names of the different storage types and why there was a big price difference, but it would have been a pretty stupid waste of time for me to go learn everything I needed to know to pick the appropriate ways to store things rather than just trusting my colleague who specializes in that.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS

Plorkyeran posted:

I really have no idea why you think that having devs make infra decisions is necessary or even useful for that. I’ve had ops come to me and say “hey we’re spending a fuckload on high durability storage for this thing you wrote”. We had a short meeting about it, came up with a feature that let us store most of the data in a cheaper way, and shipped it. At some point I’m sure they told me the names of the different storage types and why there was a big price difference, but it would have been a pretty stupid waste of time for me to go learn everything I needed to know to pick the appropriate ways to store things rather than just trusting my colleague who specializes in that.

To be clear, what I wrote earlier is that devs should own their infra. I'm not suggesting that devs make decisions in a vacuum and expect ops to support it. That's insane.

Methanar
Sep 26, 2013

by the sex ghost

12 rats tied together
Sep 7, 2006

Any pulumi touchers in the thread have experience with the automation api? I'm considering adopting it at currentjob for a couple of use cases, but mostly as a drop-in replacement for cloudformation/terraform/arm/etc inside of a more strictly orchestrated workflow system (tldr: there are no people running shell commands).

Methanar
Sep 26, 2013

by the sex ghost
Has anybody tried sharding a single kubernetes cluster across multiple physical datacenters? In reality there would be two k8s (for mutually exclusive sets of microservices), but each sharded across the DCs in the same manner.

We have a bunch of datacenters, each 'region' is a group of 3 DCs with around 3-4ms of latency between each of them and reliable interconnects. They're basically direct analogues of AZs. Somebody tell me why its a bad idea to have a single kubernetes cluster spread across 3 DCs. I can't find anything that would suggest 3-4ms rtt is a deal breaker for etcd.

The alternative is one k8s per DC and then we do gross pseudo-federation at the Spinnaker layer or something that I don't like. We already have an internal federated service controller and we'd probably need more of it.
https://kubernetes.io/docs/concepts...pdate%20access.
https://kubernetes.io/docs/tasks/administer-cluster/developing-cloud-controller-manager/

In each DC we have a smallish VMware footprint. My current idea is the control plane is ran on VMware to assist with bootstrapping, use of sharded local storage and general overall resiliency. Then the workers would be bare metal machines in the DCs (bootstrapped with MAAS).

Mostly unrelated but we're looking at probably using Kube-router as our CNI with our own ASN that we peer with the main DC network.

Methanar fucked around with this message at 01:18 on Apr 7, 2021

minato
Jun 7, 2004

cutty cain't hang, say 7-up.
Taco Defender
I think the search term you're looking for is a Multi-AZ or "stretch" cluster. Googling for that term brought up some OpenShift-related blog posts, maybe that's just Red Hat's term for it.

my homie dhall
Dec 9, 2010

honey, oh please, it's just a machine

Methanar posted:

Mostly unrelated but we're looking at probably using Kube-router as our CNI with our own ASN that we peer with the main DC network.

If your AZs are correspond to eg different subnets you will need to use Calico or something else that gives you control over the IPAM if I'm assuming correctly that you're going to be advertising pod IPs. Do you have a specific reason for advertising them? In general advertising pod IPs is a PITA and the overhead of encapsulation + SNAT has been negligible for the size of clusters I've worked on. Another thing to consider is that if you ever want to grow your IP range it may be more difficult with kube-router than with Calico, with Calico you just add another pool and you can delete pools that are empty. No experience with kube-router, but for us Calico has been a pretty positive experience.

For somewhat similar reasons I would advise not advertising service IPs and instead getting a loadbalancer implementation.

As long as you have >= 3 AZs per DC though having etcd span the AZs makes sense. Where you'd want a separate cluster per AZ is if you only had two or something

freeasinbeer
Mar 26, 2015

by Fluffdaddy
Huawei also did some work in this direction, so they might have some idea of what it looks like.

10ms is the magic barrier that zk/mesos used to break on me, but I never dug deep to figure out why.

I can also say that k3s Kubernetes API running on Postgres couldn’t deal 10ms of latency and would basically just die.

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Methanar posted:

Has anybody tried sharding a single kubernetes cluster across multiple physical datacenters? In reality there would be two k8s (for mutually exclusive sets of microservices), but each sharded across the DCs in the same manner.

I support a pretty large number of on-premises Kubernetes clusters (50+) and some of them are stretch clusters that span data centers. I don't like the pattern and I try to discourage its use because it misleads people into thinking they are now redundant when in fact they are not. You are protected against one of your 3 DCs going offline, which is probably not a common occurrence, but you are not protected against etcd crapping itself, Kubernetes crapping itself, Istio crapping itself if you are unfortunate enough to be playing the service mesh game, deploying broken software that you have to back out, or a multitude of other problems that are a lot more common in such an environment. Stretch clusters provide an insidious illusion of redundancy and a modicum of convenience, but it's not redundant in the same way that RAID is not a backup. I agree that federation is annoying to do, but my view is that I would rather endure the annoyance of having to deploy 2-3 mirror clusters in different DCs and maintain some load balancer strategy across them in exchange for not being caught out badly when the poo poo hits the fan and you're frantically scrambling to git revert and redeploy the old version of whatever piece of software just exploded or you've got a shoulder-length glove on and are reaching up etcd's butt to try to get your single golden Kubernetes cluster off the skids.

Newf
Feb 14, 2006
I appreciate hacky sack on a much deeper level than you.
Devops thread because people here probably have secrets management experience.

Anyone know where I should be looking if I'm in the market for a HSM (hardware security module) that:

- stores (locally) a private key
- performs (locally) a keccak256 hash of input data using the PK as a salt (ie, signed hash)
- returns the hash

I can't seem to find the right search terms to get relevant results.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

Newf posted:

Devops thread because people here probably have secrets management experience.

Anyone know where I should be looking if I'm in the market for a HSM (hardware security module) that:

- stores (locally) a private key
- performs (locally) a keccak256 hash of input data using the PK as a salt (ie, signed hash)
- returns the hash

I can't seem to find the right search terms to get relevant results.

I think if you’re paying for a HSM territory, you’re in paying somebody to set it up for you who knows how to avoid insecure configurations who will know the answer to that question.

LochNessMonster
Feb 3, 2005

I need about three fitty


PCjr sidecar posted:

I think if you’re paying for a HSM territory, you’re in paying somebody to set it up for you who knows how to avoid insecure configurations who will know the answer to that question.

Be prepared to pay out of your rear end for this.

Depending on what the keys are used for you might need to take care of a shitload of additional security controls. Things like, "every change to the machines need to be done under supervision of an (external) auditor, make sure nobody can physically access the machines, meaning you need to get a seperate room/cage inside the DC with mantraps/biometric authentication, etc.

Newf
Feb 14, 2006
I appreciate hacky sack on a much deeper level than you.
Maybe I'll back up a little to avoid X/Y problems.

What I want is to perform salted, signed hashes of some data, without my PK ever existing either in storage or in memory on my server.

I guess plugging a HSM into a machine which is programmatically accessible by the machine then leads to the problem of anyone with remote access to the machine also, by extension, having remote access to the signing service, which is roughly as bad as having access to the PK itself.

Man, computers!

LochNessMonster
Feb 3, 2005

I need about three fitty


Newf posted:

Maybe I'll back up a little to avoid X/Y problems.

What I want is to perform salted, signed hashes of some data, without my PK ever existing either in storage or in memory on my server.

I guess plugging a HSM into a machine which is programmatically accessible by the machine then leads to the problem of anyone with remote access to the machine also, by extension, having remote access to the signing service, which is roughly as bad as having access to the PK itself.

Man, computers!

Doing PKI securely is a lot of work, really expensive and insanely time consuming. It’s pretty difficult to make any tradeoffs without sacrificing the requirements to lead you down the road of owning a physical HSM.

But it seems you’re already aware of the results of making such decisions so I’d say you’re well ahead of the (non-elliptic :p) curve.

Hadlock
Nov 9, 2004

Man, gently caress you if you generate a password/api key with a # in it and then send it to a paying customer

Methanar
Sep 26, 2013

by the sex ghost

Hadlock posted:

Man, gently caress you if you generate a password/api key with a # in it and then send it to a paying customer

What's wrong with #

freeasinbeer
Mar 26, 2015

by Fluffdaddy

Methanar posted:

What's wrong with #

Yaml?

The Fool
Oct 16, 2003


Why the gently caress are you putting secrets in yaml

fletcher
Jun 27, 2003

ken park is my favorite movie

Cybernetic Crumb

The Fool posted:

Why the gently caress are you putting secrets in yaml

xzzy
Mar 5, 2009

I manage systems with several keys in yaml files. But they're encrypted with a private key that is totally not in a yaml file so it's absolutely okay and this will never backfire.

in a well actually
Jan 26, 2011

dude, you gotta end it on the rhyme

eyaml, lol

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
Clearly we should just go back to JSON and XML

Plorkyeran
Mar 22, 2007

To Escape The Shackles Of The Old Forums, We Must Reject The Tribal Negativity He Endorsed
You are allowed to put quotes around your strings in yaml, you know. Just because it lets you do stupid things doesn't mean you have to.

Vanadium
Jan 8, 2005

Man, gently caress you if you generate a password/api key with a " in it and then send it to a paying customer

Vanadium
Jan 8, 2005

When I was newer at this job I ended up regenerating some AWS credentials repeatedly until I got a secret key that didn't have any punctuation that got mangled by whatever misconfigured AWS SDK this tool was using.

vanity slug
Jul 20, 2010

gently caress you if your generated password isn't eight digits

12 rats tied together
Sep 7, 2006

I have never encountered a password generated from aws or any other cloud vendor that has been even a little bit difficult to store in version control.

yaml can contain spaces, every type of quote and bracket, the hyphen, pound signs, you can even shove emoji or some unicode bullshit in there. It also has type tags, including a type for an ordered map, which is not something I would expect from such a terse format.

If you use it at work, you should spend some time reading the spec.

Vanadium
Jan 8, 2005

I don't need to read the spec or remember anything about it because the first google result for yaml multiline strings is really handy.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Vanadium posted:

I don't need to read the spec or remember anything about it because the first google result for yaml multiline strings is really handy.
I ended up memorizing everything about this because it's ended up making my rat's nest of Ansible Jinja2 an order of magnitude more readable

Adbot
ADBOT LOVES YOU

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
Just remember to always quote anything that could be interpreted as a hex encoded value because yaml gets really excited about converting it quietly!

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply