Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

The companies that are bad at devops are typically bad at anything else related to delivering software to begin with. Because if a company that's gotten to the point where they have dev vs. ops problems to the degree of dysfunction I've observed as normal and haven't imploded it implies they're still in business because delivering software well hasn't been necessary for the company to survive nor succeed (or in the more rare case, success has happened in spite of the execution issues). And frankly, most of the companies that are trying desperately to turn around their broken culture are in varying degrees of decay and few can tolerate taking years and years out of their careers to try to fix aircraft carriers of technical debt, poor morale, worker-manager distrust, and complacency baked into the organization.

Useless layers of management to try to manage risk hasn't really done much for actually fixing culturally bankrupt companies with few relationships, IP, nor market presence to not suck, yet that seems to be the norm. It's like every other large company thinks they can flip companies or something like every other person watched HGTV and thought they too could flip houses for a living.

# ? May 6, 2019 01:29

Adbot: ADBOT LOVES YOU

# ? Jun 3, 2024 21:45

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

drat, you're talking about my current company to the dot. I've stuck it out for the past year but enough red flags are enough. We're big enough that software development isn't our main revenue stream, but years of poor decisions and just "sprinkling DevOps" as a silo ain't gonna fix it.

# ? May 6, 2019 01:39

Judge Schnoopy: Nov 2, 2005; dont even TRY it, pal

I work at a PaaS / SaaS that's been around for 6 or so years. My department mostly does firewall and microsegmentation work via API. Our one-liners turned to scripts and then to modules. I'm working on our recent push to containerize it all as an operational platform.

This put me in a bunch of meetings and learning sessions with the primary devops gurus (who maintain other primary roles while supporting the underlying kubernetes / Jenkins / git infrastructure) who slammed enough info into my head to get started.

So now I spend nearly all day building new tools into docker images, adding a helm front end, and deploying them on a schedule with defined input variables. Each tool takes care of a specific job aspect that I would have had to do manually.

Many of the people I work with consider me part of 'devops' now, though it's hard to see from my perspective. I'm still just automating tasks like I have from day one, through more sophisticated means. From reading this thread this may be a good outcome of devops. It's hard to quantify in a job search though.

# ? May 6, 2019 02:07

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

�Devops� to me has always just meant �care about more than just your specific wheelhouse� and look around and help make the whole system better. Good engineers already know where they can contribute best (I wouldn�t ask John Carmack to help me with a K8S cluster node going through a restart loop, right?) and try to help others do what they couldn�t do themselves. This is already antithetical to the typical Taylorism management structure because this is what management is supposed to be doing. The problem is that command and control leadership has failed to scale and be able to respond adequately to the demands of the market, and tech companies managed to figure out the limits of Taylorism management philosophies (it doesn�t work below most upper management). Servant leadership is more compatible with �everyone go out and get poo poo done� as a way to get things actually done.

# ? May 6, 2019 17:37

Helianthus Annuus: Feb 21, 2006; can i touch your hand; Grimey Drawer

to me, devops means that sysadmin types now need to have some programming chops, and devs need to know their way around a bash prompt. both good things imo

# ? May 6, 2019 17:45

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

At one place I worked, they did "every 8 weeks you're on-call, even if you're a dev".

What they thought would happen:
- Quality would go up as devs got an appreciation for how their poor coding practices affected production. More tests, etc.
- Devs would have a better understanding of the difference between dev & production scale.
- Ops wouldn't feel quite so burdened by frequent on-call cycles, as devs would share the load.

What actually happened:
- Devs learned that their bugs could take down production, but they rarely felt the impact from their own bugs, so they didn't feel much need to improve testing.
- Devs learned next to nothing about production. They didn't want to, seeing it as just more distracting stuff they didn't need to do their primary job, and not what they signed up for. When it came their turn, they just continued work on their features and hoped they'd dodge a bullet that week. And they often did, because production problems were relatively rare.
- When something did go wrong, Ops got dragged in anyway because the devs were never familiar enough with the processes or systems to fix it themselves. Again; why bother deep-learning something that's fairly reliable and you only have to babysit it every 8 weeks?
- When the managers noticed all this reluctance to learn, they tried to fix it with "When you're on-call, your job that week is to improve the pipeline". But the pipeline was so complex that it took about 3-4 days to understand it, by which point, you've only got a couple of days left to do anything useful before you're off the hook again, so why bother?

These are all solvable of course. But yeah, naively implementing "devops" is not gonna magically fix all your problems.

# ? May 6, 2019 18:27

12 rats tied together: Sep 7, 2006

In my experience "dev on call" only works when you put a lot of effort into it. The teams need to be responsible with deploying their own application on their own schedule with no oversight, they need an on call rotation dedicated to their team, and their phones should only ring when their own application breaks.

From an operational perspective you need to (rightly) consider your pagerduty (or whatever) configuration to be a production stack component. Schedules, escalation policies, group membership should be managed with automation and the production configuration of it should be the result of a deploy from some master branch always, so you can peer review proposed changes and roll back things that break. There needs to be clearly defined policies on how new people get added to the rotations, when they get added, what rotations they are on (if they are on multiple), etc.

The moment you slip on any of this and start to annoy developers needlessly you lose all of their trust in the system and they will quickly start to (rightly) ignore it or develop some awful operational practices around it that allows them to accomplish their job while giving it as little time as they can.

Umbreon posted:

If anyone here has some spare time to answer:

What's a day in this career field like? What do you do every day, and what are some of the more difficult parts of your job?(and how do you handle them?)

I've held a few of these jobs in the past 5-ish years. I would broadly describe what I do as providing reusable infrastructure abstractions to development teams. I almost always have a ticket queue of some sort with planned tasks in it, generally the tasks are related to allowing a feature development team to begin work on their team's project.

The goal is to unblock feature development teams as quickly as possible without generating a large amount of technical debt and while sticking to an agreed upon infrastructure philosophy as closely as possible. Personal judgement is a huge factor in deciding when to rigidly follow internal philosophy and when to bend the rules a little bit to get something done faster, more cleanly, or when to spend a little extra time on a project because the likelihood of repeating the project in the future again is very high. I spend a lot of time considering and executing on these tasks, and an additionally large amount of time providing peer review on other team member's proposals for their own tasks.

The difficult parts of the job generally come from 2 sources: getting accurate information from project/product management, and deciding what solutions best fit whatever the internal philosophy of the team that I'm on. Usually the difficulty is one or the other of these things, not both, because in my experience a schizophrenic project/product planning team will result in there being almost no time to consider philosophy; the team goal mutates into accomplishing all work as quickly as possible, and automation code becomes a means to that end instead of something that fits into a more holistic "devops" workflow.

# ? May 6, 2019 19:28

I HAVE GOUT: Nov 23, 2017

What storageclass options are available for baremetal kube?

So far I've tried:
- storageos - license fee after you hit 500gb. which is funny to me because avoiding dumb fees is the reason i built the server. currently waiting on them for how much its gonna cost.
- rook-ceph - it just hit v1 the other day, but its still too buggy and missing features, and their slack is just people asking how to get it working and one dude arbitrarily replying to 25% of them.

After dealing with this poo poo, I kinda just want to hostPath everything. Which would be fine (its one node right now), but a lot of helm charts dont give you enough configuration options to do hostPath.

What else is there?

# ? May 7, 2019 01:41

Methanar: Sep 26, 2013; by the sex ghost

Install ceph external to k8s, but on the same hardware :getin:

# ? May 7, 2019 02:27

Docjowles: Apr 9, 2009

12 rats tied together posted:

I've held a few of these jobs in the past 5-ish years. I would broadly describe what I do as providing reusable infrastructure abstractions to development teams. I almost always have a ticket queue of some sort with planned tasks in it, generally the tasks are related to allowing a feature development team to begin work on their team's project.

The goal is to unblock feature development teams as quickly as possible without generating a large amount of technical debt and while sticking to an agreed upon infrastructure philosophy as closely as possible. Personal judgement is a huge factor in deciding when to rigidly follow internal philosophy and when to bend the rules a little bit to get something done faster, more cleanly, or when to spend a little extra time on a project because the likelihood of repeating the project in the future again is very high. I spend a lot of time considering and executing on these tasks, and an additionally large amount of time providing peer review on other team member's proposals for their own tasks.

I like this summary a lot

# ? May 7, 2019 02:39

Nomnom Cookie: Aug 30, 2009

I HAVE GOUT posted:

What storageclass options are available for baremetal kube?

So far I've tried:
- storageos - license fee after you hit 500gb. which is funny to me because avoiding dumb fees is the reason i built the server. currently waiting on them for how much its gonna cost.
- rook-ceph - it just hit v1 the other day, but its still too buggy and missing features, and their slack is just people asking how to get it working and one dude arbitrarily replying to 25% of them.

After dealing with this poo poo, I kinda just want to hostPath everything. Which would be fine (its one node right now), but a lot of helm charts dont give you enough configuration options to do hostPath.

What else is there?

You're trying to run Kubernetes, baremetal, on a single node. Is it an option to reconsider the decisions that have led you to this point?

# ? May 7, 2019 03:24

freeasinbeer: Mar 26, 2015; by Fluffdaddy

Rook needs at least 3 nodes iirc. I run it for hobby stuff and it�s ok. It�s backup story isn�t the best.

# ? May 7, 2019 03:50

Methanar: Sep 26, 2013; by the sex ghost

I looked over how all of the rook stuff worked a while ago because somewhere cited it as a good example of helm chart design.

Its cool but lol if anyone actually trusts it.

# ? May 7, 2019 04:11

Mao Zedong Thot: Oct 16, 2008

I HAVE GOUT posted:

What storageclass options are available for baremetal kube?

So far I've tried:
- storageos - license fee after you hit 500gb. which is funny to me because avoiding dumb fees is the reason i built the server. currently waiting on them for how much its gonna cost.
- rook-ceph - it just hit v1 the other day, but its still too buggy and missing features, and their slack is just people asking how to get it working and one dude arbitrarily replying to 25% of them.

After dealing with this poo poo, I kinda just want to hostPath everything. Which would be fine (its one node right now), but a lot of helm charts dont give you enough configuration options to do hostPath.

What else is there?

have your storage team make a ceph for you and then hook it up

also rook, in theory

really, you're right -- hostPath is crazy underrated, if you build your applications right, using hostPath can be a good thing

# ? May 7, 2019 04:44

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Ceph is an unbelievably robust thing that is not at all difficult to install or administer, to the point that I simply don't understand what niche Rook fills

# ? May 7, 2019 05:14

Umbreon: May 21, 2011

12 rats tied together posted:

A well written and meaty description

Thanks a ton for this write up. If I'm reading you correctly, this can be summed up as you writing and managing tools/projects to make the jobs of your developers easier/more efficient?

# ? May 7, 2019 05:24

Mao Zedong Thot: Oct 16, 2008

Vulture Culture posted:

Ceph is an unbelievably robust thing that is not at all difficult to install or administer, to the point that I simply don't understand what niche Rook fills

Ceph for people that don't want to think too hard about running ceph

# ? May 7, 2019 05:25

Mao Zedong Thot: Oct 16, 2008

Umbreon posted:

Thanks a ton for this write up. If I'm reading you correctly, this can be summed up as you writing and managing tools/projects to make the jobs of your developers easier/more efficient?

literally that's exactly what modern SRE/devops/ops/whatthefuckever is IMHO

I write software that enables people to run other software (that makes money, that idgaf about, as long as they keep payin me, my goal is to make it run good)

# ? May 7, 2019 05:27

Kalenden: Oct 30, 2012

I am coming from a PhD in computer science that was focused on low-end IoT platforms and application orchestration (such as for wireless sensor networks).

In a new job, I'll be responsible or developing a SaaS platform that will help DevOps and is focused on decouple configuration data, infrastructure and application service definitions from tooling and platforms.

As such, it involves technologies such as Docker, Terraform, Puppet, Ansible, Kubernetes, ... as well as Databases (CouchDB), Message Queues (Redis), ... and Python development.

I have less experience with these things and DevOps in general.

Any guides or tutorials or advice you can recommend given this profile?

# ? May 7, 2019 08:12

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Mao Zedong Thot posted:

Ceph for people that don't want to think too hard about running ceph

Storage is not a good thing to not be thinking hard about, and in the cases where it's fine, a distributed storage system you have to manage yourself is probably not the best option

# ? May 7, 2019 13:01

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Kalenden posted:

... Message Queues (Redis) ...

Any guides or tutorials or advice you can recommend given this profile?

Here's a piece of advice, pal: don't use Redis as a message queue

# ? May 7, 2019 13:02

I HAVE GOUT: Nov 23, 2017

Mao Zedong Thot posted:

have your storage team make a ceph for you and then hook it up

also rook, in theory

really, you're right -- hostPath is crazy underrated, if you build your applications right, using hostPath can be a good thing

Vulture Culture posted:

Ceph is an unbelievably robust thing that is not at all difficult to install or administer, to the point that I simply don't understand what niche Rook fills

With just ceph, would the process change like this:
Previous: create the deployment with a pvc. the pv gets generated on its own
With Ceph: create the pv. create the deployment with a pvc that has matching storage requirement.

Is it a bad idea to run ceph on the host? Methanar makes it sound like its a hassle. (which still might be less annoying than using rook).
And will I be able to resize volumes? Would I just update the pv and pvc to have the new storage amount, and then theyll magically align back together on pod restart? (One of rooks failings is they wont have resize until v1.1)

# ? May 7, 2019 13:03

chutwig: May 28, 2001; BURLAP SATCHEL OF CRACKERJACKS

Vulture Culture posted:

Ceph is an unbelievably robust thing that is not at all difficult to install or administer, to the point that I simply don't understand what niche Rook fills

Because sometimes you need a little excitement in your life when the Rook operator decides to vaporize all your monitor nodes.

They�ve been working on expanding Rook to deal with other things as well including databases (?!), so now you can enjoy it handling many things unreliably, not just Ceph.

I HAVE GOUT posted:

What storageclass options are available for baremetal kube?

I have spent a lot of time on this and here are some options:

NFS. Seriously, if you can get a NetApp from somebody, try it.
Ceph. I spent years administering Ceph and it is fairly robust, but last I checked it still doesn�t have QoS as of Mimic and its security model doesn�t integrate well with Kubernetes if your intent is to do anything multitenant in Kube. For most people this is probably not a problem, but the nature of most of the clusters I run is that they�re large ML job processing clusters and by nature they�re going to be shared across many people. To secure Ceph effectively in Kubernetes, you probably need to wrap it in Cinder, and WELCOME TO OPENSTACK
OpenEBS. This is interesting if you want to keep storage and compute converged on the Kubernetes cluster, but we�ve only just started looking at it and I have no operational experience with it.
Portworx. This costs money and is somewhat similar to OpenEBS in concept.
Don�t do it. This is your best option and the one which I most strongly recommend. Shared storage in containerland is a trap.

# ? May 7, 2019 13:04

Mao Zedong Thot: Oct 16, 2008

Vulture Culture posted:

Storage is not a good thing to not be thinking hard about, and in the cases where it's fine, a distributed storage system you have to manage yourself is probably not the best option

Oh I definitely didn't mean it was a good thing. That's the niche though.

# ? May 7, 2019 15:31

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

chutwig posted:

[*]Ceph. I spent years administering Ceph and it is fairly robust, but last I checked it still doesn�t have QoS as of Mimic and its security model doesn�t integrate well with Kubernetes if your intent is to do anything multitenant in Kube. For most people this is probably not a problem, but the nature of most of the clusters I run is that they�re large ML job processing clusters and by nature they�re going to be shared across many people. To secure Ceph effectively in Kubernetes, you probably need to wrap it in Cinder, and WELCOME TO OPENSTACK

On the other hand, if you have a multi-tenant on-prem Kubernetes config that's just being set up for shared storage, there's a serious chance you're already running OpenStack

# ? May 7, 2019 16:54

Nomnom Cookie: Aug 30, 2009

chutwig posted:

Don�t do it. This is your best option and the one which I most strongly recommend. Shared storage in containerland is a trap.

Don't do it should be the default for anytime adding complexity is an option. Adding complexity is guaranteed to screw you over in ways you didn't anticipate at times when you can't afford to be screwed over. When your current architecture is so poorly fit to requirements that you're willing to sign up to get screwed in order to make the pain stop...then it's time to think about doing something.

# ? May 7, 2019 17:05

Hadlock: Nov 9, 2004

Kalenden posted:

I am coming from a PhD in computer science

In a new job, I'll be responsible or developing a SaaS platform that will help DevOps

Either your recruiter is an excellent salesman, or you're getting insane equity... I hope

# ? May 8, 2019 08:24

Umbreon: May 21, 2011

Hadlock posted:

Either your recruiter is an excellent salesman, or you're getting insane equity... I hope

What does equity mean in this context?

# ? May 8, 2019 08:50

sunaurus: Feb 13, 2012; Oh great, another bookah.

Kalenden posted:

I am coming from a PhD in computer science that was focused on low-end IoT platforms and application orchestration (such as for wireless sensor networks).

In a new job, I'll be responsible or developing a SaaS platform that will help DevOps and is focused on decouple configuration data, infrastructure and application service definitions from tooling and platforms.

As such, it involves technologies such as Docker, Terraform, Puppet, Ansible, Kubernetes, ... as well as Databases (CouchDB), Message Queues (Redis), ... and Python development.

I have less experience with these things and DevOps in general.

Any guides or tutorials or advice you can recommend given this profile?

Wow, your post reads exactly like messages I get on linkedin (all you're missing is agile blockchain machine learning technology).

That's a lot of technologies to not have experience with, I hope you'll be joining a cool team that can guide you as you get familiar with the tools you'll be using.

As far as guides and tutorials, my advice is to always prefer official documentation (which is generally very good and readable for all the software you mentioned) and try to stay away from random blog posts called something like "Python Docker Kubernetes with Redis Tutorial". It might seem tempting to just copy a setup you find from google, but blog tutorials are often out of date, don't do a good job of explaining the examples they share and sometimes offer just absolutely stupid solutions to easy problems.

# ? May 8, 2019 09:20

Docjowles: Apr 9, 2009

sunaurus posted:

stupid solutions to easy problems.

New thread title imo. And I take it back; whoever asked what you do all day in this field, it�s this.

# ? May 8, 2019 13:15

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

I glue crap together with YAML and a smidge of awkward Go and Python that is only better than not having anything there at all because God has died and meanwhile I get paid like a highly experienced software engineer.

# ? May 10, 2019 13:28

Judge Schnoopy: Nov 2, 2005; dont even TRY it, pal

necrobobsledder posted:

I glue crap together with YAML and a smidge of awkward Go and Python that is only better than not having anything there at all because God has died and meanwhile I get paid like a highly experienced software engineer.

DevOps!

# ? May 10, 2019 13:38

Comradephate: Feb 28, 2009; College Slice

necrobobsledder posted:

I glue crap together with YAML and a smidge of awkward Go and Python that is only better than not having anything there at all because God has died and meanwhile I get paid like a highly experienced software engineer.

because the highly experienced software engineers have absolutely no idea how to run their own code.

I used to feel second-class and not like a "real" software engineer, but I am writing better and better code all the time, and in the meantime the "real" software engineers still have no idea how to operate the things they create, nor do they understand the systems those things run on.

# ? May 11, 2019 05:11

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

This is, IMO, the real benefit of frameworks like Lambda where you write what you run and you run what you write

# ? May 11, 2019 13:24

spoon daddy: Aug 11, 2004; Who's your daddy?; College Slice

Vulture Culture posted:

Here's a piece of advice, pal: don't use Redis as a message queue

There are some days(not all) where I think Redis is the most misused software in modern stacks.

# ? May 13, 2019 06:13

minato: Jun 7, 2004; cutty cain't hang, say 7-up.; Taco Defender

is there any specific reason not to use Redis? I'm looking into using Celery which seems to be the most popular Python dist-task queue package, and all the official docs and tutorials seem to recommend Redis.

# ? May 13, 2019 06:28

Methanar: Sep 26, 2013; by the sex ghost

minato posted:

is there any specific reason not to use Redis? I'm looking into using Celery which seems to be the most popular Python dist-task queue package, and all the official docs and tutorials seem to recommend Redis.

redis is webscale

# ? May 13, 2019 06:56

fluppet: Feb 10, 2009

Methanar posted:

redis is webscale

I thought that qas mongo

# ? May 13, 2019 07:26

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

minato posted:

is there any specific reason not to use Redis? I'm looking into using Celery which seems to be the most popular Python dist-task queue package, and all the official docs and tutorials seem to recommend Redis.

For simple stacks where delivery guarantees and scale don't matter, Redis is the simplest possible option, and its simple protocol and trivial debuggability make it a great choice. However, Redis queues are at-most-once delivery, and when you find yourself needing at-least-once delivery, migration can be painful. These problems start when you begin dealing with distributed systems at massive scale (thousands of instances), where things are crashing and dying routinely and there's no way to ensure a task in your queue actually gets processed.

# ? May 13, 2019 11:30

Adbot: ADBOT LOVES YOU

# ? Jun 3, 2024 21:45

Gyshall: Feb 24, 2009; Had a couple of drinks.
Saw a couple of things.

Alternatively, elasticache or similar. We use it as a drop in replacement for Redis in our non-ephemeral environments.

# ? May 13, 2019 17:14

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »