|
Hadlock posted:Kind of sounds like eng as a whole needs the eye roll kind of training? Talk to your boss about scheduling a series of (mandatory) 3 or 4 x 20 minute training sessions over the course of a month and then offer an open question session for the following 40 minutes. Note down any grievances aired and either correct them or better document that stuff where necessary it's cruel to deceive the youth, the only mandatory training anyone needs is mandatory training to stop mandatory training. talking good, mandatory talking bad. drunk mutt posted:Which after posting that, had me actually realize y'all would be a good group of folks to ask this question to; there are only two ways to get people to do poo poo and follow your example. love, or leverage. both are ideal, but one or the other on their own probably will work. there's a good book from '32 by this italian writer guy that covers the specifics The Iron Rose fucked around with this message at 05:51 on Jan 5, 2024 |
# ? Jan 5, 2024 05:47 |
|
|
# ? May 16, 2024 02:24 |
|
Hadlock posted:Kind of sounds like eng as a whole needs the eye roll kind of training? Talk to your boss about scheduling a series of (mandatory) 3 or 4 x 20 minute training sessions over the course of a month and then offer an open question session for the following 40 minutes. Note down any grievances aired and either correct them or better document that stuff where necessary This is going to depend on the company size and culture and the strength of your boss and her/his management chain. I’ve been in a position where my team was literally hired as AWS experts to put together a plan for CI/CD, networking, infrastructure, golden path, sample apps etc. But then our boss got loving steamrolled politically by some long time employees who didn’t like being told what to do and the result was dev teams could do whatever the hell they wanted and we had to support it. Unsurprisingly, it sucked rear end! I’ve also worked places where domain experts do some lunch and learns and everyone else gets with the program. But that is very much not all companies
|
# ? Jan 5, 2024 06:19 |
|
Yeah if you work in ops and your boss has no sway it's time to switch jobs, yesterday The lesson I learned after my first (second?) "real" ops job was that if you don't have 110% buy in from upper management, you're not going to be successful in your agenda. I've used this talking point to get numerous job offers, and rightly so; a lot of organizations have struggling ops groups and don't know how to solve the problem And yeah if you come in at the wrong time, it might be 18 months or more before upper management finally has their come to Jesus moment and decides to throw out the baby with the bath water and try again (and you with it) I never really understood what kind of questions you should ask in an interview but I guess that's what makes you not a junior, basically you want to know "how hosed is your ops org right now, are you on the down or on the up and trying to make things right again?"
|
# ? Jan 5, 2024 08:50 |
|
It occurred to me late last night that you could render and display your entire Kubernetes cluster state using Windows registry editor and now I'm wondering what kind of shim you'd need to write to make that happen /shower thoughts
|
# ? Jan 5, 2024 19:41 |
|
I think within the mentality of building trust through the team, the "love/leverage" ideal is an interesting step. Not to dismiss the brown bag, or deep dives; I just wouldn't be down making them mandatory. These are both good in their own way, and helps my dumb rear end see a good path forward; it's probably gonna fail but gently caress it worth a shot. The idea I'm on is building trust through having "lunch and learns" or "drunk and drivel" type sessions that are geared towards a subject, no expected outcomes but gauge where people are.
|
# ? Jan 6, 2024 02:48 |
|
I have a best practices question: What information should be shared between logs and traces, and how? Context: I have developers that use lots of custom fields using python's structlog library, which basically turns stdout into a json object with keys and values. The original stdout as stored as event . We also collect interesting info like the client/user/vendor/internal GUID/environment etc. We're using elasticsearch as our backend for both logs and traces. All logs and traces have service.name and service.environment appropriately set. We can go from log->trace and sort of vice versa, but not in a single pane of glass. Right now, my devs mostly use the structlog and only rarely use the span attributes. We include the span and trace ID in the structlog, but don't store info from the structlog into span attributes by default. should we: 1. by default, synchronize all structlog fields into span attributes? a. we generate more spans than we do log messages, but could easily turn st we obviously have more spans than we do logs, but could certainly include things like last_recorded_event as a span attribute b. this could get very expensive storage wise and some of these attributes would be infinitely cardinal 2. synchronize only some fields into span attributes? If so, what fields would you include and why? a. e.g. persistent fields like client/vendor/request ID, but not things like event b. should we generally be dropping super highly cardinal fields? isn't the whole point of OTEL capturing those high cardinality events? 3. by default, synchronize all span attributes back to the structured log? a. i.e. all attributes -> structlog but not all structlog entries -> attributes b. I don't see any reason why I wouldn't want to do this regardless, elastic tolerates stupid huge fields well
|
# ? Jan 9, 2024 18:54 |
|
Fun fact - nearly every SaaS log aggregation vendor besides Splunk basically runs or has run Elasticsearch for the bulk of the work so a lot of what people run into on ES as a problem will happen to either get you charged like crazy or still be an issue. Oftentimes people would rather just pay 5x for a vendor to make things easier to use and maintain because yeah, ES is kind of like the Slackware or LFS of options if we’re going to use analogies to Linux distros. The first question I usually have wrt ES o11y is which version of ES? The second is whether you’re using OTEL or will in the future. Since you’ve answered the second I guess I can conjecture a bit that at least you’re not on some horrific version like 7.2 ES has pathological tuning issues when you have a high number of fields rather than being based around cardinality of a field (see: default maximum field limit being only 1000 although it can be pushed up easily https://www.elastic.co/guide/en/elasticsearch/reference/current/mapping-settings-limit.html ). This is part of why a lot of people throw in the towel for metrics use cases in particular and go with Prometheus for time series data in particular given high dimensionality data is something it snacks on in between small meals of high resolution queries requiring less than 100 ms response time. I’d recommend posting on their community forums given the subreddit is a drat nuclear wasteland and the devs are watching the forums as part of their responsibilities that can help them get promoted.
|
# ? Jan 9, 2024 20:15 |
|
We’re on the latest version of elastic - I literally updated it today - and have it deployed on k8s using their operator. Services send telemetry to OTEL collectors running on their compute node, which send to a gateway, which sends across the network to our elasticsearch APM agent. It works surprisingly well. Most of our services are in k8s, but we’ve also got serverless to think about. standard logging to elastic is robust and comprehensive at this point after about a year’s worth of build our. I hear you on the max number of fields per index, we’ve run into that before and the solution was flattened data types. You get an arbitrary number of subfields in exchange for only keyword based queries. Works quite well, but we have only a few dozen fields per APM index right now so I’m less worried about that. My biggest concern is avoiding the excessive storage of redundant data while also getting enough data from logs into spans so devs start looking at APM first instead of at the log first. We’ve got a bunch of services instrumented kinda crappily by our devs, so I’m building an SRE-managed OTEL library with useful things like automatic span attribute propagation via baggage and a custom span processor, automatic functions to add things like code.lineno, code.filename, code.function, even commit links if I can figure out how to get the metadata in. Working on fixing up some of their context propagation too while I’m at it. Right now they’ve got context propagation partially done, a handful of manually instrumented spans, a bevy of 3rd party instrumentation libraries for requests et. al., and a structlog function that injects the span and trace IDs into the log object. Since I’m writing code in their repos regularly, this is why I’m asking about structlog, what information we should be sharing between logs and spans. I think this means I would want to store mostly low cardinality fields like the client or vendor, some small high cardinality fields like the internal request ID (which lets you find the related logs), but not things like the “event” field, which can in some cases be 8000 characters of raw SQL depending on what devs shove in there. Would a reasonable rule of thumb be: Store in both structlog and span attributes if: - a field is low cardinality - if a field is a highly cardinal but relatively small (GUIDs, not stdout) Store in only the log (stdout to structlog is indexed as both text and a flattened object field) - large, highly cardinal fields like stdout, stack traces, store only in the log and not the span attribute. Store only in the span attribute; - third party autoinstrumented attributes maybe? I feel like this is safe to toss in a flattened field (which all structured logs are) If I was using honeycomb maybe I could get away with more highly cardinal fields, but I’m not, so I need to watch my data. We are *not* storing metrics in elastic currently - most infrastructure metrics go to Datadog. It’s on the long term horizon but it’s just way too much data for us to manage on prem. All traces and spans have to stay within our self managed infrastructure, but metrics can go to Datadog. We have a very small handful of custom metrics we’re sending to Datadog now that could fairly easily be sent to elastic - no more than 10-20k a day. The Iron Rose fucked around with this message at 22:43 on Jan 9, 2024 |
# ? Jan 9, 2024 22:23 |
|
help I'm seriously thinking about writing a service discovery tool
|
# ? Jan 10, 2024 21:49 |
|
The Iron Rose posted:Would a reasonable rule of thumb be:
|
# ? Jan 10, 2024 22:40 |
|
The Fool posted:help I'm seriously thinking about writing a service discovery tool If you have a novel idea of what a service actually is, this sounds like a good problem for you to solve
|
# ? Jan 10, 2024 23:24 |
|
it's a using a forklift to lift a feather type situation
|
# ? Jan 11, 2024 00:35 |
|
The Fool posted:it's a using a forklift to lift a feather type situation
|
# ? Jan 11, 2024 01:25 |
|
The Fool posted:help I'm seriously thinking about writing a service discovery tool Post username combo
|
# ? Jan 11, 2024 07:10 |
|
Has anyone tried using gpt to generate test db table data Part of what our app (and 95% of all business apps) does is, at a very high level, show delta in value and rate of change, day by day. If the data is more than 2 days stale it renders poorly/not at all. But it's sensitive info so can't drop that prod data into the dev DB that gets shipped around (and probably lost) on developer laptops. Classic problem. I'm thinking I could generate 3 rows of data by hand, and then send it to gpt as a csv file, and ask for 100 more similar rows with datetime stamps in a certain range? Then insert that data into the dev DB table/s daily Maybe there's an SDET thread Edit: ChatGPT is down, but even wish.com grade Mistral Instruct can do this with ease Double edit: fine for tiny csvs, choked on our 102 column "contacts" table with 4 rows of data. Output code:
Maybe the better solution is to ask gpt to write me a data fuzzing function given column names and data types More edit: ChatGPT wrote me what looks like a mostly correct script using Python and faker with an adjustable maximum date range. Neat. More edit: ChatGPT is really good at writing slack bots that interface with the database Hadlock fucked around with this message at 09:54 on Jan 11, 2024 |
# ? Jan 11, 2024 07:57 |
|
The Fool posted:help I'm seriously thinking about writing a service discovery tool On what platform Custom annotations on resources has been what everyone has been doing on Kubernetes for drat near ten years now seems to work great. At one job I even saw a guy storing app state in there. I guess ArgoCD does too but they use proper CRDs instead of jamming it into annotations bgreman posted:Post username combo
|
# ? Jan 11, 2024 08:01 |
|
Hadlock posted:On what platform it would be in support of azure services that don't support vnet connections and as a result need to have network access policies configured the real solution is to just use private endpoints, which occured to me while taking my dogs out shortly after VC's second reply bgreman posted:Post username combo
|
# ? Jan 11, 2024 15:30 |
|
Hadlock posted:Has anyone tried using gpt to generate test db table data
|
# ? Jan 11, 2024 17:56 |
|
Hadlock posted:Has anyone tried using gpt to generate test db table data Sorry, this might be 100% incorrect for your usage Hadlock, but given all of the fun I had with BF goons in the past I wanted to chime in. Have you looked at https://mockaroo.com/? We use it manually (without the api) for all sorts of data testing and mocking of live data.
|
# ? Jan 11, 2024 22:49 |
|
bgreman posted:Post username combo lmao
|
# ? Jan 11, 2024 23:03 |
|
You’re looking for TDM. If you have a budget check out Delphix.
|
# ? Jan 12, 2024 04:31 |
|
Ever worked with someone who refuses to check the terraform plan output before applying since if it worked locally it’ll surely work in production so iur pros pipelines should just autoapprove? And I’m not talking about providing a plan as input for the prod pipeline.
|
# ? Jan 12, 2024 08:51 |
|
LochNessMonster posted:Ever worked with someone who refuses to check the terraform plan output before applying since if it worked locally it’ll surely work in production so iur pros pipelines should just autoapprove? Yes many times over. Has anyone not worked with this person? This is why we mandate using Atlantis to apply all terraform changes even if it is a pain in the rear end. Because some dipshit applying from a stale branch on their laptop and destroying important resources is a much bigger pain in the rear end.
|
# ? Jan 12, 2024 09:39 |
|
Docjowles posted:Yes many times over. Has anyone not worked with this person? This is why we mandate using Atlantis to apply all terraform changes even if it is a pain in the rear end. Because some dipshit applying from a stale branch on their laptop and destroying important resources is a much bigger pain in the rear end. Has never worked with tf beform and claims looking at tf plan is a manual step and an antipattern for ci/cd. “This why you review the tf code in a pull request, you don’t do another manual review in the pipeline”. I cannot even.
|
# ? Jan 12, 2024 09:48 |
|
Hadlock posted:Has anyone tried using gpt to generate test db table data Dbeaver Pro has this built-in: https://dbeaver.com/docs/dbeaver/Mock-Data-Generation-in-DBeaver/
|
# ? Jan 12, 2024 10:39 |
|
The Fool posted:it would be in support of azure services that don't support vnet connections and as a result need to have network access policies configured
|
# ? Jan 12, 2024 17:23 |
|
LochNessMonster posted:Has never worked with tf beform and claims looking at tf plan is a manual step and an antipattern for ci/cd. “This why you review the tf code in a pull request, you don’t do another manual review in the pipeline”. he shouldn't even be able to merge to main without a plan running with a little bit of effort you can check for destroys or even specific high impact changes and require additional approvals if they're detected
|
# ? Jan 12, 2024 17:27 |
|
Vulture Culture posted:Glad to see you've moved on from reimplementing service discovery to only reimplementing all the non-discovery parts of a service mesh working with what I got, and what I got are azure services and vnets
|
# ? Jan 12, 2024 17:33 |
|
The Fool posted:he shouldn't even be able to merge to main without a plan running I agree, but this person has very strong opinions on a product they have 0 experience with. They have no idea what terraform does or how it works besides iac. Yet they strongly push for running plan locally and run autoapprove in the pipeline. No need for fmt or validate either as you can do all of that locally, so why do it in a pipeline. I tried to explain this nicely but since nobody in our team besides me has ever worked with tf before, we are apparently going to entertain this idea and see if it’s a feasible strategy. I now understand why the coworker that left wanted to strangle them after their first week on the team. I’m strongly considering just building the foundation under the radar and do a “oh yeah I kind of already build it, so now it’s here and it’s already in use so let’s just use this instead of refactoring something that works perfectly fine”. I just need to come up with a way to prevent them reading state locally as to not circumvent the pipeline entirely.
|
# ? Jan 12, 2024 21:35 |
|
LochNessMonster posted:I agree, but this person has very strong opinions on a product they have 0 experience with. i love these people, and by love i mean i want to bash their heads in with my keyboard LochNessMonster posted:I just need to come up with a way to prevent them reading state locally as to not circumvent the pipeline entirely. if you're hosting it in s3, setup a bucket policy which only allows access from the ip range of your ci runners.
|
# ? Jan 12, 2024 21:48 |
|
auto-approving or auto-applying anything in any pipeline is a big yellow flag, IMO. it's a time bomb set to two different clocks: complexity and inexperience.
|
# ? Jan 12, 2024 22:11 |
|
vanity slug posted:i love these people, and by love i mean i want to bash their heads in with my keyboard I’m all for training new folks but dang, it’d be nice if they’d be open to listen to others who have some experience with things. Thanks for the suggestion on the bucket policy based on IP range. I’ll definitely keep that in mind.
|
# ? Jan 12, 2024 22:44 |
|
So, what's the ideal Terraform workflow with CI/CD? Do you have it do a terraform plan and then use your CI tools approval ability to wait for someone else to review the plan output and approve it, and then the next step in the pipeline is to apply the previously generated plan?
|
# ? Jan 13, 2024 05:14 |
|
the ideal workflow would be that you have a mirror infrastructure and that your ci/cd tool plans and deploys to it fully before you merge and apply to production. you could also merge to a release branch or something and then periodically merge the release into production. the main thing is that terraform plan lies, so there's no good CI/CD plan because you can't ensure that it's actually going to work. if you don't have a mirror infrastructure you can actually validate every aspect of your change with, you just have to pick how you deal with the stuff that's wrong, which is a "pick your poison" sort of scenario. it's still poison.
|
# ? Jan 13, 2024 05:21 |
|
12 rats tied together posted:the ideal workflow would be that you have a mirror infrastructure and that your ci/cd tool plans and deploys to it fully before you merge and apply to production. you could also merge to a release branch or something and then periodically merge the release into production. Not that I'm disagreeing with this, just to expand and shift a little bit; in a "trunk based" world, never ever ever ever ever let divergence occur through even short lived branches (in context of TF). There should TOTALLY be some kind of gate which approves the planed changes into production; may this be manual, or in more matured environments gated behind some form of testing. Dude complaining about TF plan being a manual step and shouldn't be needed, needs to grow up and realize that is the goal for them and not the current expectation. It's not one of those things you just get out of the box, especially if you have people like this altering the shared state. Why no, never have I ever worked with this type of person.
|
# ? Jan 13, 2024 06:55 |
|
People having zero concept of incremental changes and intermediate steps to implementing processes in software systems probably shouldn't be working in software. Just a hunch.
|
# ? Jan 13, 2024 08:00 |
|
LochNessMonster posted:I agree, but this person has very strong opinions on a product they have 0 experience with. This was also the case with my old coworker who wanted to yolo TF apply from his laptop with no oversight. He may have never used Terraform in his life, but man, did he have opinions on the best way to implement it. Firing off hot takes (mostly "this thing sucks we should write our own
|
# ? Jan 13, 2024 16:48 |
|
Docjowles posted:This was also the case with my old coworker who wanted to yolo TF apply from his laptop with no oversight. He may have never used Terraform in his life, but man, did he have opinions on the best way to implement it. Firing off hot takes (mostly "this thing sucks we should write our own lol did you work with an old coworker of mine who wrote a ruby dsl to generate and applycloudformation templates?
|
# ? Jan 13, 2024 16:52 |
|
Blinkz0rz posted:lol did you work with an old coworker of mine who wrote a ruby dsl to generate and applycloudformation templates? lol not that I know of but it absolutely sounds like the kind of poo poo he'd do
|
# ? Jan 13, 2024 16:53 |
|
|
# ? May 16, 2024 02:24 |
|
I'm setting up ephemeral dev environments with containers Current pattern is django for the backend, and some kind of yarn dev node js thing where they render and export the front end to an S3 bucket where it's served from there In "dev mode" they run "yarn dev" which serves the front end locally and then they point it either at their local laptop django or they can point it at a shared staging Django I'm 99% sure I'm going to leave the "upload the front end to S3" pattern for production alone that's about as reliable and low maintenance as you can get I've dealt with pushing the front door to S3 but never the entire front end so this is new territory for me For dev environments/ephemeral stuff, would you 1) compile a giant mono container with both Django and "yarn dev" running, identical to how it runs on the developer laptop 2) separate containers for Django and yarn dev? Keep in mind we would rarely if ever scale up beyond 1 in these environments, seems like we're doubling the complexity of the system 3) publish dev code to a dev S3 bucket and run Django as a container Hadlock fucked around with this message at 19:07 on Jan 13, 2024 |
# ? Jan 13, 2024 19:04 |