|
lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works.
|
# ? May 15, 2019 15:19 |
|
|
# ? May 28, 2024 14:49 |
|
yard salad posted:lol we just replaced our JMS-driven event bus with a kafka/zoopkeeper solution. the entire enterprise publisher/subscription config process is still an email to the one guy who knows how it works. you send a message and eventually you get a response. sounds like any other distributed system to me imo
|
# ? May 25, 2019 22:32 |
|
elasticsearch changed their data model between major versions so that you can't have multiple types per index anymore. making breaking changes between major versions is fair enough but it doesn't stop it being frustrating when you have to go make changes to your logging code just for this.
|
# ? Jun 14, 2019 16:16 |
|
just use grep
|
# ? Jun 16, 2019 15:29 |
|
animist posted:just use grep there's a reason we always log to a local file as well as ELK
|
# ? Jun 17, 2019 08:44 |
|
I've updated a thing to use Prometheus-cpp to export metrics to DataDog. It's a bit poo poo but just about works (tm). I'd like the Civetweb HTTP engine replaced with a Boost/Beast wrapping to sit on an IO context. I'd like not absolutely everything running in the constructor so that it's easier to create a non-centralised metric deployment, like Id Software's idCVar for configuration. I have no idea what DataDog does, I just see it running in ECS.
|
# ? Jul 7, 2019 21:28 |
|
datadog is a metrics/alerting service you $$$ for. its nice
|
# ? Jul 8, 2019 04:38 |
|
datadog gets very sad if you have lots of tags/cardinality
|
# ? Jul 8, 2019 09:43 |
|
nothing more money wont fix
|
# ? Jul 8, 2019 09:49 |
|
Progressive JPEG posted:datadog gets very sad if you have lots of tags/cardinality they had a bug for a long time where their unique identifier was the hostname which is fine except we reuse ip addresses and consequently hostnames as instances come up and down so we'd get all sorts of misattributed events and metrics until we figured out what was going on
|
# ? Jul 8, 2019 11:19 |
|
Blinkz0rz posted:they had a bug for a long time where their unique identifier was the hostname lol what the gently caress
|
# ? Jul 8, 2019 13:09 |
|
i think there were other conditions that caused it to became the identifier but it was a big old pain in the butt let me tell you
|
# ? Jul 8, 2019 13:47 |
|
its amazing how many junior devs make the mistake of using name as a unique id instead of treating it as the display field it is.
|
# ? Jul 8, 2019 14:36 |
|
We still write local text logs for everything as backup because all the modern solutions either crash and lose data occasionally or have broken auth
|
# ? Jul 8, 2019 21:33 |
|
Kafka in particular crashes more frequently than most of the services which write to it
|
# ? Jul 8, 2019 21:34 |
|
yes kafka is loving garbage software
|
# ? Jul 8, 2019 21:41 |
|
is there a good way to identify bottlenecks in a bunch of short-lived cloud spot instances? prometheus has pushgateway which seems ok but i haven't tried to actually set it up yet
|
# ? Jul 9, 2019 07:24 |
|
you want tracing, not just metrics.
|
# ? Jul 9, 2019 17:09 |
|
re: tracing, this was a post i enjoyed about stack overflow's monitoring setup https://nickcraver.com/blog/2018/11/29/stack-overflow-how-we-do-monitoring/
|
# ? Jul 9, 2019 20:06 |
|
we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch its cool when someone used a constant sampler (samples every trace) and we were producing 10gb of traces each day in our test environment. pro-tip: use a probabilistic sampler or something that says "sample 5% of traces" instead.
|
# ? Jul 9, 2019 23:33 |
|
i know there's the #monitoringlove hashtag but this is ridiculous ! ! anyway that sounds pretty cool, crip, first i'd heard of people taking jaeger seriously but i haven't had to deal with monitoring backends in awhile so hey
|
# ? Jul 10, 2019 00:29 |
|
wow that's a big iguana
|
# ? Jul 10, 2019 01:44 |
|
CRIP EATIN BREAD posted:you want tracing, not just metrics. hm, yeah probably. well i don't need anything right away anyway
|
# ? Jul 10, 2019 06:59 |
|
i want to set up graylog or elk again but i hate elasticsearch
|
# ? Jul 10, 2019 11:49 |
|
CRIP EATIN BREAD posted:we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch alternately consider a fine saas tracing product that performs tail sampling
|
# ? Jul 10, 2019 13:49 |
|
CRIP EATIN BREAD posted:we use the opentracing api for everything and use jaeger as our backend which feeds it into elasticsearch Yeah, I set up tracing in our app, and the dev team was like "Can we have 100% sample rate in staging?" and also the dev team "We want to do some load tests in staging."
|
# ? Jul 10, 2019 20:31 |
|
a good idea is to make it configurable at run-time so you can adjust it in cases liek that
|
# ? Jul 10, 2019 22:20 |
|
Yeah, it can be set by env var, but I don't trust them to not gently caress up. We do have some stuff set to 100%, and some stuff set to 1/1000000, while most is at 10% or so.
|
# ? Jul 10, 2019 22:39 |
|
no i mean, that it can be adjusted. like you have some sort of external way of updating the values. that way you can just send a command to reduce the sample rate
|
# ? Jul 10, 2019 22:41 |
|
Working out why a log message isn't making it from my app into a kibana dashboard is one of my least favourite things.
|
# ? Jul 11, 2019 15:44 |
|
its because you've lived a life of sin
|
# ? Jul 12, 2019 00:23 |
|
Anyone done anything with Sysmon 10.0? Looks like it can pull DNS queries now. Currently doing that with splunk/streamstats from the DCs.
|
# ? Jul 24, 2019 08:53 |
|
vector.dev looks nice
|
# ? Jul 24, 2019 10:50 |
|
my stepdads beer posted:vector.dev looks nice it looks neat but also they have these performance comparisons and they aren't even close to feature complete, yet.
|
# ? Jul 24, 2019 14:32 |
|
yeah I gave it go yesterday, very early days for it.
|
# ? Jul 25, 2019 00:07 |
|
So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings? I see most providers center around 15-day stuff, so I like I'm thinking about it wrong if I want to hold on to some things for 30-90 days.
|
# ? Jun 9, 2020 17:04 |
|
I used paper trail at one point, they had an option to save a copy of everything you sent them to s3. Maybe datadog has something similar? If you're using this for billing, it seems like you need something more transactional. What if the log never makes it to dd?
|
# ? Jun 9, 2020 18:43 |
|
Hed posted:So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings? don't rely on logs for core business functions
|
# ? Jun 9, 2020 19:23 |
|
Hed posted:So if you're sending logs to a managed provider like Datadog, but some fraction of the logs contain transactions that you want to hang on to for billing questions, do you just have a cron job pull out the info from Datadog and put it into your own holdings? definitely don't rely on logs for core business functions but furthermore definitely don't have datadog in-the-loop for anything important. We used it at my last job and ddog was great for general stuff (looking to see if a thing was done restarting, counting the lower bound of errors, etc) but it would routinely have outages/delays/lost logs to the point that if something was missing in datadog we just assumed it was fine
|
# ? Jun 9, 2020 20:05 |
|
|
# ? May 28, 2024 14:49 |
|
Ok I hear you all on that, but then if the log is essentially the audit trail of what happened, I'm struggling to figure out how to do it differently without it looking like a log... Are people running local logstash then forwarding to a log MSP, and then running extraction jobs locally for business records? Or are you saying something like I should install a callback to an ERP or something for every transaction? I appreciate the info.
|
# ? Jun 11, 2020 23:19 |