|
I have strong opinions on logging and monitoring and metrics. Here's one: if your log line is a string you hosed up. Get some structured logging going.
|
# ¿ Feb 22, 2019 20:09 |
|
|
# ¿ May 14, 2024 01:35 |
|
uncurable mlady posted:i wouldn't say it's just in your web app - our entire application is instrumented from webapp all the way down to the db. that said, it's kind of a massive pain in the rear end right now to get trace data from resources you don't directly manage because not everyone uses opentracing and even if they did, wire formats are very tracer dependent. I have big opinions about what to trace and logs, and where to put the probes -- about 15 pages worth of opinions -- that I put in one place at https://ferd.ca/operable-software.html Mostly if I have to TL:DR; my views it is that:
And so the idea is to think in terms of "operator experience" the same way we would do "user experience", and figure out patterns in which to lay information that talks to the different types and levels of expertise of users and operators. If you don't have that, you have a lot of data, but it's not necessarily going to be useful at all.
|
# ¿ Feb 24, 2019 15:20 |
|
abigserve posted:if he has any idea how to do it i'm all ears, i'll send it straight to our infosec team who are currently building a hadoop stack to try and deal with it Captain Foo mentioned this because I used to work for the routing team at Heroku and maintained part of their logging stack for a while, while I now work in a security company and helping them set up some IoT stuff for data acquisition, so it does make for a funny overlap. I have however not worked in infosec directly. I don't know what exactly your team's doing, but going for hadoop for infosec and networking makes me think that they're trying to straight up do analytics on network traces or at least network metadata (connection endpoints, protocols/certs, payload sizes, etc.) -- so it'd be interesting to figure out what they're actually trying to accomplish. If it's a dragnet thing it's different from actual logging since you would probably have the ability to control some stuff there? Most network software logs at least tend to have a semblance of structure, so they're not as bad of a cause as <Timestamp> could not load user <id> because an exception happened which essentially requires a full-text search to do anything with.
|
# ¿ Feb 25, 2019 01:59 |
|
Right. So the two patterns there are essentially what they're doing and just hadooping the hell out of it, or otherwise treating individual logs as a stream that you have to process into a more standard format. Currently they're likely forwarding logs from specific servers to remote instances by reading them from disk first and then shoving them over a socket (or some syslog-like agent); the cheapest way to go would be to do that stream processing at the source as the agent reads up the logs from the disk and before it forwards them. This requires deploying the agent to all instances (or as a sidecar), but from that point on, all the data is under a more standardized format, and you can shove it in hadoop or splunk or whatever. E: do forward unknown/unformatteable logs but annotated as such within a structure so that they can iteratively clean poo poo up
|
# ¿ Feb 25, 2019 04:20 |