Continuous Integration/build engineering/devops thread

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

I've got an interesting question, and I was referred over here by a goon co-worker. We manage an application platform that has bunch of tomcats all running a variety of different servlets/webapps. One of our challenges is monitoring the performance of the servlets specifically. We use the standard JMX commands to get details on the tomcats, but I'd like to know through out the day the actual memory and threads used by each servlet (I suspect maybe that I can't get it per servlet, but maybe I can get it per java class, and since I know what class is in each servlet that could be equivalent). We have an ELK stack running that is tracking performance metrics already so if I could just get that data out of the tomcat in some format that'd be enough. If I can output it directly to logstash or beats or something it'd a plus.

In each tomcat we deploy several webapps each in their own path:

http://tomcat/servlet1
http://tomcat/servlet2
http://tomcat/servlet3

My ideal wish would be some Grafana/kibana dashboards for each tomcat, and then a couple line graphs for each servlet in that tomcat showing the threads, heap usage, and whatever other metrics I can get. If I can just get the heap usage per context/webapp I'd be happy enough. All the servlets are developed in house, so I can even get R&D to add jars to the builds, and since we manage the devops packages, I can add jars to our tomcat bundles as well, so I can basically do whatever I want as long as we can do it in ansible.

I know the jmv doesn't natively expose this type of data, and that you need to use something else. I've come across glowroot. We've installed it in a couple tomcats and it appears to produce the data I want, but I need to figure out how to query it using an API or something and pipe that data into ELK. I'm running in a dead end trying to expose the data. It writes to an h2 database, so I could possibly just parse it every five minutes but that's a real hacky solution.

Does anyone have any suggestions on this or experience with glowroot? If there's a better product, I'd be more than happy to check it out.

# ? Dec 17, 2019 15:29

Adbot: ADBOT LOVES YOU

# ? Jun 6, 2024 07:50

Vulture Culture: Jul 14, 2003; I was never enjoying it. I only eat it for the nutrients.

Jerk McJerkface posted:

I've got an interesting question, and I was referred over here by a goon co-worker. We manage an application platform that has bunch of tomcats all running a variety of different servlets/webapps. One of our challenges is monitoring the performance of the servlets specifically. We use the standard JMX commands to get details on the tomcats, but I'd like to know through out the day the actual memory and threads used by each servlet (I suspect maybe that I can't get it per servlet, but maybe I can get it per java class, and since I know what class is in each servlet that could be equivalent). We have an ELK stack running that is tracking performance metrics already so if I could just get that data out of the tomcat in some format that'd be enough. If I can output it directly to logstash or beats or something it'd a plus.

In each tomcat we deploy several webapps each in their own path:

http://tomcat/servlet1
http://tomcat/servlet2
http://tomcat/servlet3

My ideal wish would be some Grafana/kibana dashboards for each tomcat, and then a couple line graphs for each servlet in that tomcat showing the threads, heap usage, and whatever other metrics I can get. If I can just get the heap usage per context/webapp I'd be happy enough. All the servlets are developed in house, so I can even get R&D to add jars to the builds, and since we manage the devops packages, I can add jars to our tomcat bundles as well, so I can basically do whatever I want as long as we can do it in ansible.

I know the jmv doesn't natively expose this type of data, and that you need to use something else. I've come across glowroot. We've installed it in a couple tomcats and it appears to produce the data I want, but I need to figure out how to query it using an API or something and pipe that data into ELK. I'm running in a dead end trying to expose the data. It writes to an h2 database, so I could possibly just parse it every five minutes but that's a real hacky solution.

Does anyone have any suggestions on this or experience with glowroot? If there's a better product, I'd be more than happy to check it out.

Logstash can do JMX natively:

https://www.baeldung.com/tomcat-jmx-elastic-stack

# ? Dec 17, 2019 18:42

Super-NintendoUser: Jan 16, 2004; COWABUNGERDER COMPADRES; Soiled Meat

Vulture Culture posted:

Logstash can do JMX natively:

https://www.baeldung.com/tomcat-jmx-elastic-stack

That's really interesting, and I'll check it out. However, I'm fairly certain that JMX doesn't expose the data on a level as granular as I'd like. All my research indicates it requires an additional jar in the tomcat container. Glowroot appears to provide what I want:

I've setup the ELK AMP-server (https://www.elastic.co/guide/en/apm/get-started/current/overview.html) but it doesn't provide the data that Glowroot does.

# ? Dec 17, 2019 18:59

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

If your application or it�s frameworks don�t expose metrics you�ll have to derive them by scraping logs, checking secondary footprint, etc. or you�ll have to write them. There�s no free lunch for metric exports. It�s what I�ve been writing for far too long because devs had no time / planning allocated for them

# ? Dec 17, 2019 20:23

Zorak of Michigan: Jun 10, 2006

Re container chat, my org is still in its infancy in containerizing workloads. I've been advocating Kubernetes because, when I tinkered with Swarm, I couldn't imagine it scaling up to the number of different teams I would hope would eventually be using our container environment. Is there something easier to live with for an on-prem deployment than Kubernetes that can still support multiple siloed teams deploying to it?

# ? Dec 18, 2019 00:54

Bhodi: Dec 9, 2007; Oh, it's just a cat.; Pillbug

Zorak of Michigan posted:

Re container chat, my org is still in its infancy in containerizing workloads. I've been advocating Kubernetes because, when I tinkered with Swarm, I couldn't imagine it scaling up to the number of different teams I would hope would eventually be using our container environment. Is there something easier to live with for an on-prem deployment than Kubernetes that can still support multiple siloed teams deploying to it?

Other than docker-compose? no. If you use docker swarm, you're going to regret it.

IMO docker-compose is good enough for a majority of stuff that doesn't aggressively autoscale. The last few pages talk a bit about this.

Bhodi fucked around with this message at 02:28 on Dec 18, 2019

# ? Dec 18, 2019 02:25

Methanar: Sep 26, 2013; by the sex ghost

Just use docker compose.

Don't do kubernetes on-prem unless you really know what you're doing. You will need to reinvent several wheels yourself before its usable.

Like:
- Ingress
- Deploying the cluster in the first place.
- Monitoring stack (learn prometheus [ what do you mean prometheus doesn't scale ] )
- RBAC, are you going to just give everybody the root certificate for kubectl? You can.
- If you want a pleasant deploying experience you're going to have to build one.
- Developer education
- Are developers going to know how to self-serve when it comes to debugging. ( What do you mean I can't rsync cowboy my code into the container. Your pet project is interfering with my ability to work! )
- Keeping up with the whole god drat ecosystem and knowing what's changing, versions go EOL after like, 18 months.
- Do your own persistent storage stack
- The overlay network fiasco

Methanar fucked around with this message at 02:48 on Dec 18, 2019

# ? Dec 18, 2019 02:38

Mao Zedong Thot: Oct 16, 2008

Kubernetes is easy and good. Helm is a loving tire fire.

# ? Dec 18, 2019 02:40

Methanar: Sep 26, 2013; by the sex ghost

Kubernetes is easy and good if you use somebody's hosted stack and only ever acknowledge deployment definitions and ignore literally every other aspect of being responsible for operating the platform

# ? Dec 18, 2019 02:43

FISHMANPET: Mar 3, 2007; Sweet 'N Sour
Can't
Melt
Steel Beams

Zorak of Michigan posted:

Re container chat, my org is still in its infancy in containerizing workloads. I've been advocating Kubernetes because, when I tinkered with Swarm, I couldn't imagine it scaling up to the number of different teams I would hope would eventually be using our container environment. Is there something easier to live with for an on-prem deployment than Kubernetes that can still support multiple siloed teams deploying to it?

Piggybacking on this and answers, what about products that offer k8s on prem like openshift, or some product VMware just bought whose name escapes me, or, I don't know, other vendors?

We're a big public University so there could legitimately be a lot of research applications that could use auto scaling and other features. But if we have it we'd also get a lot of simpler "line of business" apps that may not need those capabilities, but if they're there they'll get used, and then for no reason we'll be depending on them and then we'll be stuck with tooling more complicated than we need.

# ? Dec 18, 2019 03:03

Spring Heeled Jack: Feb 25, 2007; If you can read this you can read

As a vmware shop I�m super interested in whatever their plans are for the integrated k8s product, whenever it decides to actually surface.

But for now we use AKS and it�s been pretty good due to the above (ignoring everything else about actually running k8s).

However we�re getting to the point where we would like some clusters on prem because that�s where our big boy DBs are, and our devs need a playground for modernizing our old rear end LOB apps. I�ve started looking at ranchers offerings and they seem pretty turn key once you get an infrastructure deployment pipeline going.

# ? Dec 18, 2019 03:18

New Yorp New Yorp: Jul 18, 2003; Only in Kenya.; Pillbug

https://www.jetbrains.com/space/

What on earth possessed them to do this? This is like the Microsoft Windows Phone of devops -- they're way late to the party.

Also, I can't believe how blatantly they're copying Microsoft's Azure DevOps visual design.

# ? Dec 18, 2019 18:49

Potato Salad: Oct 23, 2014; nobody cares

FISHMANPET posted:

Piggybacking on this and answers, what about products that offer k8s on prem like openshift, or some product VMware just bought whose name escapes me, or, I don't know, other vendors?

We're a big public University so there could legitimately be a lot of research applications that could use auto scaling and other features. But if we have it we'd also get a lot of simpler "line of business" apps that may not need those capabilities, but if they're there they'll get used, and then for no reason we'll be depending on them and then we'll be stuck with tooling more complicated than we need.

One of my clients is a large university, and in the last two quarters I did a pretty okay job of pointing out to them that they don't really want containerization, they just want better self-service VM delivery, maintenance, and billing.

# ? Dec 18, 2019 19:21

Potato Salad: Oct 23, 2014; nobody cares

Like, don't containerize just because containers. What does your workload actually need

# ? Dec 18, 2019 19:21

Pie Colony: Dec 8, 2006; I AM SUCH A FUCKUP THAT I CAN'T EVEN POST IN AN E/N THREAD I STARTED

Alternatively, containerize just because containers. It'll make things easier later and looks good on your resume.

# ? Dec 18, 2019 19:35

taqueso: Mar 8, 2004

As a hobbyist that wants to be able to replicate installs for a few little things, is Ansible the tool I should look at? I'd like to be able to be able install a linux OS with latest updates, install some other software, copy in a few config files, run a couple commands, that kind of thing.

Potato Salad posted:

Like, don't containerize just because containers. What does your workload actually need

We've got a 3 gallon container and a 5 gallon container, but we really want 4 gallons!

# ? Dec 18, 2019 19:41

Matt Zerella: Oct 7, 2002; Norris'es are back baby. It's good again. Awoouu (fox Howl)

taqueso posted:

As a hobbyist that wants to be able to replicate installs for a few little things, is Ansible the tool I should look at? I'd like to be able to be able install a linux OS with latest updates, install some other software, copy in a few config files, run a couple commands, that kind of thing.

It wont install the OS for you but yes it'll do everything else. And its pretty easy to pick up too.

# ? Dec 18, 2019 19:58

Volguus: Mar 3, 2009

New Yorp New Yorp posted:

https://www.jetbrains.com/space/

What on earth possessed them to do this? This is like the Microsoft Windows Phone of devops -- they're way late to the party.

Also, I can't believe how blatantly they're copying Microsoft's Azure DevOps visual design.

Just as there are people out there that containerize just because containers, there are people who JetBrains because IDEA. Given the rabid following they have in the IDE space, it was just a matter of time really.

# ? Dec 18, 2019 20:46

StabbinHobo: Oct 18, 2002; by Jeffrey of YOSPOS

Jerk McJerkface posted:

I've got an interesting question, and I was referred over here by a goon co-worker. We manage an application platform that has bunch of tomcats all running a variety of different servlets/webapps. One of our challenges is monitoring the performance of the servlets specifically. We use the standard JMX commands to get details on the tomcats, but I'd like to know through out the day the actual memory and threads used by each servlet (I suspect maybe that I can't get it per servlet, but maybe I can get it per java class, and since I know what class is in each servlet that could be equivalent). We have an ELK stack running that is tracking performance metrics already so if I could just get that data out of the tomcat in some format that'd be enough. If I can output it directly to logstash or beats or something it'd a plus.

In each tomcat we deploy several webapps each in their own path:

http://tomcat/servlet1
http://tomcat/servlet2
http://tomcat/servlet3

My ideal wish would be some Grafana/kibana dashboards for each tomcat, and then a couple line graphs for each servlet in that tomcat showing the threads, heap usage, and whatever other metrics I can get. If I can just get the heap usage per context/webapp I'd be happy enough. All the servlets are developed in house, so I can even get R&D to add jars to the builds, and since we manage the devops packages, I can add jars to our tomcat bundles as well, so I can basically do whatever I want as long as we can do it in ansible.

I know the jmv doesn't natively expose this type of data, and that you need to use something else. I've come across glowroot. We've installed it in a couple tomcats and it appears to produce the data I want, but I need to figure out how to query it using an API or something and pipe that data into ELK. I'm running in a dead end trying to expose the data. It writes to an h2 database, so I could possibly just parse it every five minutes but that's a real hacky solution.

Does anyone have any suggestions on this or experience with glowroot? If there's a better product, I'd be more than happy to check it out.

unfortunately i have nothing to offer solution wise, just a fearful warning

most metrics collection on jvms is waaaay too coarse resolution for the numbers on the heap to mean anything, specifically in terms of young-gen and the latency impact of gc pauses. I used to have to connect in with the VisualGC plugin on VisualVM and set the refresh rate to 100ms (so 10 datapoints per second). thats when you can actually see whats going on.

# ? Dec 19, 2019 04:26

Doc Hawkins: Jun 15, 2010; Dashing? But I'm not even moving!

i discovered skaffold a few days ago and goddamn is it good

assuming you've already fallen to the kubernetes side of the force

# ? Dec 19, 2019 05:01

Qtotonibudinibudet: Nov 7, 2011; Omich poluyobok, skazhi ty narkoman? ya prosto tozhe gde to tam zhivu, mogli by vmeste uyobyvat' narkotiki

i need to check back to see if skaffold has post-push hooks yet. our poo poo is largely interpreted and easily hot-reloaded, but skaffold didn't have a way to send a SIGHUP to make that reload happen last time i checked. rebuilding the container image and respawning is a fuckton more overhead so i've still been testing in a VM instead

# ? Dec 19, 2019 06:52

Nomnom Cookie: Aug 30, 2009

taqueso posted:

As a hobbyist that wants to be able to replicate installs for a few little things, is Ansible the tool I should look at? I'd like to be able to be able install a linux OS with latest updates, install some other software, copy in a few config files, run a couple commands, that kind of thing.

We've got a 3 gallon container and a 5 gallon container, but we really want 4 gallons!

Uh if you want what you say you want, then for centos you want kickstart. If you don�t want centos then you are wrong and should choose a different distro

# ? Dec 19, 2019 08:14

Doc Hawkins: Jun 15, 2010; Dashing? But I'm not even moving!

CMYK BLYAT! posted:

i need to check back to see if skaffold has post-push hooks yet. our poo poo is largely interpreted and easily hot-reloaded, but skaffold didn't have a way to send a SIGHUP to make that reload happen last time i checked. rebuilding the container image and respawning is a fuckton more overhead so i've still been testing in a VM instead

we use node (yes, i know), so i give dev containers a "nodemon" cmd that watches the source files which skaffold syncs in and restarts the process

the repo has an example of hot reloading for, i think, next.js

# ? Dec 19, 2019 08:25

taqueso: Mar 8, 2004

Nomnom Cookie posted:

Uh if you want what you say you want, then for centos you want kickstart. If you don�t want centos then you are wrong and should choose a different distro

I was looking at kickstart/fedora. It doesn't get much fanfare, unlike ansible, kubernetes, etc. TBH I don't feel like I quite understand what is capable of what / what is better at what yet, they all seem to be capable of a lot of things that seem to overlap. I was guessing that newer tools had superseded or wrapped around stuff like kickstart so they could be somewhat platform agnostic. It's likely I don't quite know what I want to do or I'm saying it poorly.

I am looking for something approximating a smart pseudo-reimaging system, with some key management and configuration control. I'd like to be confident that I could recreate a machine if needed, or quickly make 3 machines that run the same service. And be able to modify a stored config so it produces a similar machine with an extra service. And have them all be given unique credentials unless I actually am replacing a machine.

# ? Dec 19, 2019 19:28

Matt Zerella: Oct 7, 2002; Norris'es are back baby. It's good again. Awoouu (fox Howl)

taqueso posted:

I was looking at kickstart/fedora. It doesn't get much fanfare, unlike ansible, kubernetes, etc. TBH I don't feel like I quite understand what is capable of what / what is better at what yet, they all seem to be capable of a lot of things that seem to overlap. I was guessing that newer tools had superseded or wrapped around stuff like kickstart so they could be somewhat platform agnostic. It's likely I don't quite know what I want to do or I'm saying it poorly.

I am looking for something approximating a smart pseudo-reimaging system, with some key management and configuration control. I'd like to be confident that I could recreate a machine if needed, or quickly make 3 machines that run the same service. And be able to modify a stored config so it produces a similar machine with an extra service. And have them all be given unique credentials unless I actually am replacing a machine.

Maybe packer + ansible?

# ? Dec 19, 2019 19:32

Docjowles: Apr 9, 2009

Kickstart is automated OS install plus whatever custom scripts you want to run afterward. It�s built into RHEL/CentOS/etc. It�s basically an answer file for the installer to fill in all the prompts so it can run unattended. If all you want is to reinstall a machine from scratch and customize a few files it should be fine. Tools like Ansible or Chef or whatever are for later in the management lifecycle.

Packer can be a nice add on for automating the flow. We use it to build VM templates by laying down a minimal OS install via kickstart and then bootstrapping Chef onto it which does the rest of the configuration.

# ? Dec 19, 2019 22:09

taqueso: Mar 8, 2004

The packer webpage makes it sound very promising. :coal:

# ? Dec 19, 2019 22:19

Methanar: Sep 26, 2013; by the sex ghost

So there is just no good solution at all for non americans needing to deal with govcloud right.

This absolutely sucks, this schism fundamentally introduces drift.

# ? Jan 6, 2020 23:34

12 rats tied together: Sep 7, 2006

12 rats tied together posted:

One thing I want to do that I'm having trouble finding mention of in marketing materials is something akin to complex event processing. Basically I don't really care about average CPU utilization across a cluster of compute nodes, but while operating this cluster of compute nodes, and the application running on them, we've noticed a number of events that occur throughout the application lifecycle.

It would be really sick if I could ring my phone when we see one type of event happen and then we don't see any followup events of a different type inside the next 24 hours, for example.

This is from a long time ago but I found that there is actually a tool for this that isn't just "run flink" and it is here: http://riemann.io/ It seems kind of bad, but also, it seems like it's the only attempt at approaching this problem in an operational way (ex: not running kafka and a stream processing engine).

edit to avoid double post:

taqueso posted:

I was looking at kickstart/fedora. It doesn't get much fanfare, unlike ansible, kubernetes, etc. TBH I don't feel like I quite understand what is capable of what / what is better at what yet, they all seem to be capable of a lot of things that seem to overlap. I was guessing that newer tools had superseded or wrapped around stuff like kickstart so they could be somewhat platform agnostic. It's likely I don't quite know what I want to do or I'm saying it poorly.

I am looking for something approximating a smart pseudo-reimaging system, with some key management and configuration control. I'd like to be confident that I could recreate a machine if needed, or quickly make 3 machines that run the same service. And be able to modify a stored config so it produces a similar machine with an extra service. And have them all be given unique credentials unless I actually am replacing a machine.

Ansible is usually "and kickstart" and not "instead of kickstart". Ansible-playbook is a command that you run from your laptop, and kickstart scripts execute on machines when they boot. You can do everything, in each, from the other, and you should pick which one you want to use based on how you want it to be executed.

My personal favorite project in this space is ubuntu MaaS which is not at all perfect, but it lets you write cloud-init scripts for your physical servers, which is dope especially if you spend a lot of time in AWS and you don't want to pick up the entire ansible and awx tech stack from scratch.

12 rats tied together fucked around with this message at 21:51 on Jan 8, 2020

# ? Jan 8, 2020 00:20

Methanar: Sep 26, 2013; by the sex ghost

https://kubernetes.io/blog/2020/01/22/kubeinvaders-gamified-chaos-engineering-tool-for-kubernetes/

quote:

KubeInvaders - Gamified Chaos Engineering Tool for Kubernetes

It is like space invaders but the aliens are PODs.

# ? Jan 22, 2020 19:50

NihilCredo: Jun 6, 2011; iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Methanar posted:

https://kubernetes.io/blog/2020/01/22/kubeinvaders-gamified-chaos-engineering-tool-for-kubernetes/

This dude's company lists the following open positions:

Java Senior Craftsman
Big Data Enthusiast
Javascript Funambulist
DevOps Ninja
Atlassian Magician
Alfresco Specialist
Machine/Deep Learning Clairvoyant

I want to make fun of them but on second thought "funambulst" is the most fitting noun for JS development I have ever heard.

And of course "clairvoyant" is a brutally honest description of how ML makes money.

# ? Jan 24, 2020 21:18

Pile Of Garbage: May 28, 2007

NihilCredo posted:

This dude's company lists the following open positions:

Java Senior Craftsman

Big Data Enthusiast

Javascript Funambulist

DevOps Ninja

Atlassian Magician

Alfresco Specialist

Machine/Deep Learning Clairvoyant

Has this "haha we're so weird monkey cheese" poo poo ever worked in recruitment? It's almost wholly unique to tech companies as I've never once seen it used in recruitment for other industries. Maybe I'm just jaded and cynical but whenever I see poo poo like that it's an immediate red-flag. Also I'm pretty sure it's just been a ploy to trick people into accepting under-paying positions on the bullshit premise of "It's not work, it's fun!"

# ? Jan 25, 2020 11:36

Votlook: Aug 20, 2005

Pile Of Garbage posted:

Has this "haha we're so weird monkey cheese" poo poo ever worked in recruitment? It's almost wholly unique to tech companies as I've never once seen it used in recruitment for other industries. Maybe I'm just jaded and cynical but whenever I see poo poo like that it's an immediate red-flag. Also I'm pretty sure it's just been a ploy to trick people into accepting under-paying positions on the bullshit premise of "It's not work, it's fun!"

Indeed, whenever I see this childish poo poo I just assume the company is run by immature idiots.
These type of positions also often advertise with stuff like 'YOU GET TO WORK ON A BRAND NEW LAPTOP!'.
Oh great, I don't have to bring my own device to work for you?

# ? Jan 26, 2020 14:02

Boz0r: Sep 7, 2006; The Rocketship in action.

I get an error when I use a Visual Studio Test task in my build pipeline in ADO. I have a unit test project using xUnit and targeting .NET Core 3.0.

code:

[xUnit.net 00:00:00.00] xUnit.net VSTest Adapter v2.4.0 (64-bit .NET Core 3.0.2)
[xUnit.net 00:00:00.90]   Discovering: Unittest.xUnit
[xUnit.net 00:00:00.95]   Discovered:  Unittest.xUnit
[xUnit.net 00:00:00.95]   Starting:    Unittest.xUnit
[xUnit.net 00:00:01.14]   Finished:    Unittest.xUnit
  &#8730; Unittest.Test1 [114ms]
  &#8730; Unittest.Test2 [< 1ms]
  &#8730; Unittest.Test3 [< 1ms]
  &#8730; Unittest.Test3 [< 1ms]
  &#8730; Unittest.Test4 [< 1ms]
  &#8730; Unittest.Test5 [< 1ms]
##[error]Unable to find d:\a\1\s\Project\Test\Unittest.xUnit\obj\Release\netcoreapp3.0\Unittest.xUnit.deps.json. Make sure test project has a nuget reference of package "Microsoft.NET.Test.Sdk".

The project already has a nuget reference to that package. I saw some posts where people got it working by changing the target framework from .NET Framework to .NET Core, by my project is already .NET Core.

# ? Jan 27, 2020 08:59

Doc Hawkins: Jun 15, 2010; Dashing? But I'm not even moving!

Doc Hawkins posted:

i discovered skaffold a few days ago and goddamn is it good

assuming you've already fallen to the kubernetes side of the force

I'm going to re-iterate this for 2020: if you're deploying things to kubernetes, give skaffold a close look. The local development continuous build-and-deploy loop has utterly spoiled me.

# ? Jan 27, 2020 18:20

Methanar: Sep 26, 2013; by the sex ghost

I really can't say enough bad things about Prometheus, the Prometheus operator, and Helm.

This trash is just usable enough that you can get yourself into a sunk cost situation.

# ? Feb 4, 2020 01:36

necrobobsledder: Mar 21, 2005; Lay down your soul to the gods rock 'n roll; Nap Ghost

Haven't had any problems with InfluxDB honestly. Using ELK for metrics and logging isn't that bad if you just keep throwing more money honestly and don't mind not having monitoring with much better resolution than 15s in practice.

# ? Feb 4, 2020 12:20

12 rats tied together: Sep 7, 2006

Depending on your requirements you might end up running prometheus and ELK together anyway since prometheus is very explicitly not an event logging system, and there are warnings all over the documentation and the creator's consulting firm website to this effect. You can coerce any event log into a metric but you can't really go the other way around -- at least not without instantly OOMing your prometheus servers and entering entirely self inflicted dist sys failure spiral hell.

It's an ok tool if you have a ton of things to monitor and one of your main struggles is simply getting everything monitored. Once you "have everything" though, or if you've never struggled in that dimension, it's really hard to stay excited about it.

Helm is definitely garbage though, no argument from me there. Kustomize is bad too but at least it is trying to follow an established RFC (6902 and friends), at least it is built into kubectl already, and at least you don't need to commit 30+ instances of calling toYaml to deploy an application.

12 rats tied together fucked around with this message at 16:45 on Feb 4, 2020

# ? Feb 4, 2020 15:43

xzzy: Mar 5, 2009

I've had nothing but good experiences with Prometheus, but like 12 rats suggested I only put numbers in to it. I got 2300 servers each exporting 1700 metrics and it's been bulletproof.

My only real critique is long term metrics storage, people around here love having graphs since the dawn of time but Prometheus is very explicit that it's not intended to fill that role so I can't be too fussy about it. At least they provide methods to export to databases that do perform that job.

# ? Feb 4, 2020 16:33

Adbot: ADBOT LOVES YOU

# ? Jun 6, 2024 07:50

12 rats tied together: Sep 7, 2006

A key constraint is that you can put hell of numbers in it, but if you're putting a timestamp in there, the timestamp is only allowed to be "right now". You can't, like, run a weekly batch process and backfill data into last weeks' time period and then do some time series graphing.

It's a great tool for putting the kind of numbers in that occur consistently and in defined, knowable schemas (prometheus is not very good at handling numbers that only sometimes exist or sometimes exist in different formats), and keeping track of that kind of number is a huge part of a successful monitoring stack. It can't be the only tool in your monitoring stack though, unless you're willing to accept the inability to do certain kinds of stuff like backfilling data or accurately charting things that you only know happen after they happen.

Which is fine. You can't reasonably apply DRY/single responsibility principle to infrastructure stacks. It's okay to use locally optimal tools as long as you do some kind of cost benefit analysis on them and your team agrees that they're worth the cost.

# ? Feb 4, 2020 21:06

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Cavern of COBOL > Continuous Integration/build engineering/devops thread

«‹›158 »