|
'durrr kafka works fine on my laptop in docker'
|
# ? Mar 9, 2019 16:57 |
|
|
# ? May 28, 2024 16:13 |
Arcsech posted:gotcha what the hell is a minimum_master_node?
|
|
# ? Mar 9, 2019 17:00 |
pram posted:lol no. it isnt. youve never used it for anything serious stfu. for example some of these were fixed and substantially improved with future releases
|
|
# ? Mar 9, 2019 17:03 |
lancemantis posted:tbh a lot of software people consider magical scaling wizardry is a nightmare and I’m convinced the people bringing it in flee before the consequences hit or hav never used it beyond toy projects what's the alternatives to elastic scaling?
|
|
# ? Mar 9, 2019 17:04 |
|
jeffery posted:what the hell is a minimum_master_node? you have to tell elasticsearch how many master-eligible nodes it needs to have a quorum to elect a master, typically 50%+1 of master-eligible nodes. that setting is called minimum_master_nodes, and if you set too low your cluster can get split brain, if you set it too high, then your cluster won’t be able to tolerate as many node failures as it should as of Soon it will figure this out for itself instead of you having to tell it (e: to be clear, you still have to tell it what nodes are in the cluster at startup, but it will keep itself up to date after that when you add/decommission nodes and you don’t have to remember to update the quorum in addition to the initial nodes list) this is good because that setting is the biggest pain in the rear end and having an incorrect minimum_master_nodes is a great way to make your cluster take a dump big time Arcsech fucked around with this message at 05:13 on Mar 11, 2019 |
# ? Mar 11, 2019 05:02 |
|
I still don’t know what any of this bull poo poo is
|
# ? Mar 12, 2019 17:51 |
|
the prom / grafana guys are making a log thing now https://grafana.com/loki no full text search though, also it only works with k8s atm
|
# ? Mar 13, 2019 08:42 |
|
lol @ elastic https://aws.amazon.com/blogs/opensource/keeping-open-source-open-open-distro-for-elasticsearch/
|
# ? Mar 13, 2019 13:37 |
|
my stepdads beer posted:the prom / grafana guys are making a log thing now lol y tho Blinkz0rz posted:lol @ elastic lol y tho
|
# ? Mar 13, 2019 14:01 |
|
that aws piece is some top notch concern trolling
|
# ? Mar 13, 2019 14:04 |
|
Blinkz0rz posted:lol @ elastic lmao
|
# ? Mar 13, 2019 15:10 |
|
Blinkz0rz posted:lol @ elastic uncurable mlady posted:that aws piece is some top notch concern trolling yep whats especially funny is that this is mostly repackaging/forks of existing oss projects: security=searchguard, sql=NLPchina/elasticsearch-sql, performance analyzer=perf, but they're playing it up like they did all the work instead of just slapping a new logo on and maybe some light modifications in fairness i think the alerting thing might be new amazon-written code, or maybe i just haven't found where they took it from yet lol
|
# ? Mar 13, 2019 16:43 |
|
didn't searchguard also have that open core, enterprise features thing? i think amazon implemented the enterprise features atop the open core too. they got owned too if that's the case.
|
# ? Mar 13, 2019 22:15 |
|
crazysim posted:didn't searchguard also have that open core, enterprise features thing? i think amazon implemented the enterprise features atop the open core too. they got owned too if that's the case. amazons security "advanced modules" are literally just searchguard's "enterprise modules" with the license changed right down to the TODOs and commented out code ex: FieldReadCallback.java from the amazon repo w/ apache license header FieldReadCallback.java from the searchguard enterprise repo w/proprietary license header i dunno if they did some kinda deal with searchguard or what but it's definitely the same code e: lmao https://github.com/floragunncom/search-guard-enterprise-modules/issues/35 Arcsech fucked around with this message at 22:42 on Mar 13, 2019 |
# ? Mar 13, 2019 22:36 |
|
uncurable mlady posted:lol y tho i assume they got sick of waking up to CLUSTER: RED
|
# ? Mar 14, 2019 02:54 |
|
software just wants to be free - jeff bezos, free software advocate
|
# ? Mar 14, 2019 03:11 |
|
current monitoring job status: i have 15 different unsee dashboards open and there's anything from 10 to 700 warnings/alerts in bouning around in each of them. i'm somehow supposed to keep eye on all of these with only the two monitors i have these are monitoring a bunch of datacenters that are a hellish mix of mesos-marathon microservice container stuff and poo poo running straight on the metal, located around the world. i'm connected to them through a collection of ssh-tunnels over connections and vpns that sometimes just decide to stop working. there's a lot of gently caress A CONTAINER IS DOWN!!! alerts and then go check it and everything is fine. bunch of alerts that once fired will hang around in the dashboard for 6 hours because ??? i'm never confident i'll catch the real problems with all this useless noise. there's graphana too but a lot of the stuff isnt configured right so you have to go massage some prometheus queries by hand to get the graphs you need documentation if of course nonexistent and/or poo poo. configuring the monitoring is some other teams' job and a lot of the time all i can do is open a ticket and hope. when poo poo hits the fan i have only a vague idea who's responsible for what and who the hell i'm supposed to call so i get to wake up my team leader at four in the morning so he can figure it out welp thats my story, back to lurking
|
# ? Mar 16, 2019 21:18 |
|
please watch these broken dashboards/alerts, and absolutely never under any circumstances fix any problems you find maybe ask manager why they aren't empowering you to maintain the things you depend on to do your job
|
# ? Mar 16, 2019 21:48 |
|
breaking my vow of lurking to share my cool monitoring story i work at one of your favorite tech monoliths and we're doing this big new enterprise service with like 20 different components frankensteined together. our monitoring was supposedly very good though. All 20 of these pos components were logging everything effectively, dashboards were all set up, etc. for an entire 2 weeks, we had a live site issue where component A was DoSing component B for a customer with a shitload of data. wasn't obvious to the customer that anything was amiss, but our service was functionally useless to them for those two weeks, and we didn't have a loving clue moral of the story is that none of your lovely monitoring does anything if nobody bothers to set up alerts (and that nobody looks at monitoring dashboards unprompted once the demo is over)
|
# ? Mar 16, 2019 22:43 |
|
supabump posted:breaking my vow of lurking to share my cool monitoring story this is why hystrix loving owns
|
# ? Mar 16, 2019 22:50 |
|
supabump posted:breaking my vow of lurking to share my cool monitoring story counterpoint: too many alerts Elos posted:current monitoring job status: i have 15 different unsee dashboards open and there's anything from 10 to 700 warnings/alerts in bouning around in each of them. i'm somehow supposed to keep eye on all of these with only the two monitors i have Is all monitoring useless? The answer may surprise you!! (yes). The only workflow I've found to work in the monitoring world is to only alert on things you won't otherwise be alerted to naturally - an example would be a core router being down, I don't need an email or SMS to tell me that my network is hosed. Instead, focus your alerts on malicious smaller issues you may not notice for a very long time, like errors on an interface or high IO wait on a database server. Then, you need at least a once a week housecleaning of all the alerts. We do it at our team meeting and it takes about 10 minutes because it's a religious affair - and if an alert is red AND unacknowledged for more than a week, it gets removed from monitoring as it's clearly not important.
|
# ? Mar 16, 2019 23:20 |
|
anyway here's some poo poo i've been working on behind the scenes for the past several months https://twitter.com/opentracing/status/1111389502889574400?s=20
|
# ? Mar 28, 2019 23:14 |
|
Prometheus owns, if anyone has questions we use it all the time.
|
# ? Apr 28, 2019 22:43 |
|
opentracing is cool
|
# ? Apr 29, 2019 00:56 |
|
CRIP EATIN BREAD posted:opentracing is cool thanks, I hope we don’t completely gently caress it up with the merger!!
|
# ? Apr 29, 2019 01:15 |
|
Sylink posted:Prometheus owns, if anyone has questions we use it all the time. what do you find useful to monitor do you install node exporter on every vm use it for alerting?
|
# ? Apr 29, 2019 08:51 |
|
Mostly node exporter for basic system metrics. Disable all the collectors you dont need, one thing to do in prometheus is to not collect metrics you dont use or want, saves on space and tedium , by default a ton of poo poo is on the node exporter, and it will catch all kinds of useless disk device metrics depending on your setup (looking at u kubernetes) We also use autodiscovery and we run kubernetes clusters, so there are a ton of useful kubernetes prometheus tie ins, prometheus operators is great. Anyway, you get all your collectors and exporters up, then the fun begins. Now we build dashboards in grafana and they cover each level of our application, so we a cluster view, a node view, and an application view to drill down levels and into problems. The dashboards are also committed to code if they are important. The real skill is creating dashboards that are useful and understanding what each metric really is. A lot of prom stuff is counters which will trick you as they just go up over time, so you have to remember to take irates and so on. Depending on your app, its very easy to make custom Prometheus exporters for scraping custom metrics. Do this where possible. Every thing you can turn into a simple number metric is less digging through bullshit logs. Logs loving suck and should be your last restort as much as possible imo. To the point where you go to the log just to confirm what you already know from the metrics in explicit text form. And alerts are so easy to make and tweak, since they are just PromQL and the same queries you are using in the dashboards but with some condition attached, so you can play around in the Prom dash looking at data, find the right metrics, and copy paste that into an alert pretty much.
|
# ? Apr 30, 2019 01:29 |
|
so does OpenTracing do everything prometheus does, or should i be running some combination of OpenTracing + logging + metric collection? like is there some tracer i can plug into OpenTracing to make it pretend to be prometheus, or do i need to do that separately also, how long does e.g. Jaeger keep traces around? are they created on-demand, or built up from some sort of in-memory thing, or... whatever basically i'm a little confused about the practical difference between logs vs metrics vs spans/tags vs traces keep in mind i don't really know what i'm talking about animist fucked around with this message at 03:41 on Apr 30, 2019 |
# ? Apr 30, 2019 03:36 |
|
animist posted:so does OpenTracing do everything prometheus does, or should i be running some combination of OpenTracing + logging + metric collection? like is there some tracer i can plug into OpenTracing to make it pretend to be prometheus, or do i need to do that separately they’re different things although we’d like to condense it to one library for instrumentation (this is the point of the OpenTracing/opencensus merger). I can post some more about what it looks like today tomorrow
|
# ? Apr 30, 2019 05:52 |
|
Sylink posted:prom ty that was very helpful
|
# ? Apr 30, 2019 10:21 |
|
No problem, I could talk for days about metrics and data collection. My only complaint about prometheus is that its documented poorly with regards to actual usage, so you have to dig around. But this site has pretty much every link you'll find on Google https://www.robustperception.io
|
# ? Apr 30, 2019 10:37 |
|
Does opentracing have anything for PHP that doesn't require editing your code project like the way New Relic works? I work mainly with performance on PHP apps (they suck) and New Relic traces/transaction data is extremely useful, and easy to use since we dont control the codebases. I.e. the PHP module comes up and does magic so we dont have to deal with editing anyone's code. I've seen alternatives, but none are as easy and painless. But New Relic pricing sucks if you have a lot of services you want to use it on.
|
# ? Apr 30, 2019 13:49 |
|
love to waste time on workarounds because prometheus devs have a stick up their rear end about some kind of ideological purity that requires hobbling the __name__ label
|
# ? May 2, 2019 21:05 |
|
We do suricata at work. It’s poo poo, mlmp?
|
# ? May 3, 2019 03:21 |
|
all I know about monitoring is I still maintain an out of date horrible thing that is now called "SMArTS/ViPR SRM" but we can't get the updates for some reason so I'm running "watch4net" Also I just spent 3 weeks dealing with using TL1 vs SNMP cause reasons.
|
# ? May 3, 2019 03:59 |
|
Gotta agree with pram that kafka sucks poo poo and breaks constantly in frustrating ways. The tooling is also garbage. ELK also seems to crash and lose data a lot but that might just be our infrastructure team, idk.
|
# ? May 11, 2019 17:27 |
|
kafka is so very bad i cant even
|
# ? May 11, 2019 22:02 |
|
im reminded of this beauty of an error, for example. kafka partitions would literally just get corrupted and the broken log file would prevent the broker from STARTING https://issues.apache.org/jira/browse/KAFKA-3919 the only way you could fix it was to go delete the actual file sitting on the disk. if it affected multiple brokers (because of unclean leader election) hope you like data loss and of course during all this you're totally offline so its a huge outage. epic and ftw
|
# ? May 11, 2019 22:06 |
|
kafka works fine if you don't touch it but lol that pulsar and logdevice were open sourced but kafka is good enough so no one even bothers
|
# ? May 11, 2019 23:30 |
|
|
# ? May 28, 2024 16:13 |
|
Prometheus is cool and good. What do people do about their historical metrics. I have my current deployment set to retain metrics for 3 months, and throw away anything older. Any recommendations for storing the older metrics? They would be very rarely queried. Is Thanos the way to go or are there other better solutions out there?
|
# ? May 15, 2019 03:20 |