|
I think ELK/EFK is pretty widespread, no?
|
# ¿ Jul 27, 2018 20:56 |
|
|
# ¿ May 16, 2024 01:12 |
|
the idea that everyone on a team should be involved in all three of writing, releasing, and deploying code is a bad one
|
# ¿ Aug 1, 2018 21:23 |
|
crazypenguin posted:I've got a medium-sized (50 project/repos) jenkins build thing going with the newish multibranch pipeline approach. It works mostly great, but I have a few issues. For the first issue we keep our automation code separate from the actual code and it works pretty well. You still keep your pom or whatever is required to build with the code in the code repo, but you keep all the automation and coordination between separate jobs in another repo. The pattern where the jenkinsfile lives with the code leads only to headaches.
|
# ¿ Aug 17, 2018 01:46 |
|
Is there anyone else doing non-image artifact-based CM, eg chef-zero or ansible bundles? Where/when do you drop your artifacts? We have a bunch of terraform for setting up instances of our infrastructure and currently it lives separately from the provisioning code, but we want to merge the repos so we can release them as one. Currently our hosts pull provisioning code from git (I know, I know), but continuing to do this is for sure going to gently caress us up for any sort of branching when we merge the two repos (here come the self-dependencies) and also it’s just bad practice. My thought right now is to generate an instance-specific CM bundle with Terraform that’s pushed onto S3, but I’m wondering if anyone has hit any issues with something like that. We also have artifactory available, but we wouldn’t be able to have per-instance artifacts which are important for us for testing new Infra or CM code. I’ve handled this in the past for smaller projects by delivering a hex-encoded tar of the provisioning code in user data, but unfortunately we’re just past the 16k limit 😉
|
# ¿ Sep 11, 2018 04:08 |
|
Hadlock posted:Our monolith takes about 15 minutes to build, other services are in the 3-10 minute range. Do you guys do load tests? Any integration with internal services outside your control?
|
# ¿ Sep 28, 2018 01:19 |
|
Let’s say you want to have a version endpoint or string somewhere in your app. Isn’t that something that’s going to be static in your artifacts? If so, how do you keep the pattern of using same artifact in QA/Staging/Prod if you’ll need to modify this as it gets promoted?
|
# ¿ Nov 24, 2018 15:27 |
|
Kevin Mitnick P.E. posted:If it changes based on deployment environment then it's not a version. It doesn’t change based on deployment environment, but when it’s in QA it’s still just a release candidate for us, not a real release yet. We promote it to being a real release once it goes through a bunch of integration tests with other systems. (having a “QA process” instead of real CICD is non-ideal, but it’s unfortunately not my decision to make) Maybe our endpoint should just return a build number?
|
# ¿ Nov 25, 2018 03:06 |
|
Kevin Mitnick P.E. posted:Build a thing, do a bunch of QA on it, then build a different (but hopefully close enough) thing and release it is not a process I would be comfortable with. Right, that’s something I don’t think I want to do. Here’s what I think are my options, but I’m wondering if there’s maybe another - rebuilding/repackaging artifact at promote time, after QA (bad for the reasons stated) - not having release candidates and only ever sending real releases to QA (would result in a ton of versions for us because we always find stuff in QA) - not having version information available in the artifact
|
# ¿ Nov 25, 2018 05:06 |
|
ThePeavstenator posted:I'm not sure if I've missed something but why can't you just do the old <Major>.<Minor>.<Release>.<Build> ? actually I think this would totally work and I don’t know why I didn’t consider doing this. we have been sticking “RC” in our version strings for stuff that hasn’t gone through QA, but dropping that would alleviate all the pain. 🤦♂️ thanks!
|
# ¿ Nov 25, 2018 05:46 |
|
terraform is certainly an imperfect and frequently buggy tool, but also the least bad one available for certain types of problems
|
# ¿ Dec 20, 2018 02:09 |
|
New Yorp New Yorp posted:I've been translating my ARM template for this project into Terraform. Here are my Terraform impressions: The state file is like the entire point of the software!! Terraform bills itself as infrastructure provisioning software, which it does, but so do lots of other solutions. Its killer feature is keeping track of provisioned resources so it can determine the operations required to go from the current state to the new desired state. And it does this for a ton of “providers”, which can be things like cloud resources or a deployed helm application or TLS resources. It’s great for creating and keeping track of long-lived, relatively static things in a repeatable manner and falls down when you need custom or complicated transitions between states. If you don’t care about keeping track of the state/understand why you need to then use another tool. And you should drop the state in whatever azure’s equivalent of s3 is.
|
# ¿ Jan 9, 2019 18:17 |
|
New Yorp New Yorp posted:Why does it need an external tracking mechanism? All I care about is the current state (which it can look at by querying the platform) and the desired state (which is defined in my configuration). If you store the state in azure’s s3 it is querying the platform To respond to what you mean though, “querying the platform” is not a good way to figure out what it’s responsible for. What is used to uniquely identify a resource varies from resource to resource. Resource names are not guaranteed to be unique for every resource and tags are not available on all types of resources Terraform wants to provision. And, as mentioned, there are many providers other than the cloud ones which may have no means for being queried. e: Also, resource deletion.
|
# ¿ Jan 9, 2019 18:44 |
|
Spring Heeled Jack posted:Are there any good online resources regarding networking patterns in container orchestration? We're very much coming from a place of VLAN network-segmentation with ACLs and we're wondering how this translates into the wonderful world of containers. K8s will run everything on the same overlay. If you want to restrict connectivity in that overlay you would use something like a service mesh or NetworkPolicy resources.
|
# ¿ Jan 31, 2019 14:51 |
|
redis is a fine place to store things you don’t care about. if you don’t care about what you’re putting in the queue, then yes it’s good for that. If your application’s correctness relies on a message being received then redis is a bad choice
|
# ¿ May 13, 2019 19:26 |
|
Volguus posted:What would be a good and reliable message queue then? Persistent message queues at the center of your system — your “enterprise message bus” — add an unnecessary level of indirection and add a centralized point of failure. Use service discovery for sending messages and/or a reliable persistent store if you have any need to handle your messages transactionally.
|
# ¿ May 14, 2019 05:07 |
|
Kevin Mitnick P.E. posted:Something has gone badly wrong if calling a persistent store "reliable" isn't trivially true. and yet that’s the operational reality of running rabbit/activemq/redis in my experience. something bad happens in the queue - it fills up, slows down, or is losing messages somehow, so you bounce or purge it to fix it. I hope your devs took into account that their “reliable” queue could drop everything on the floor at any time.
|
# ¿ May 14, 2019 12:38 |
|
Admission controller is the way to go.
|
# ¿ Jun 4, 2019 01:06 |
|
rt4 posted:I use Puppet at work but I just don't like Ruby. What alternatives might I consider that aren't so drat slow? They’re all bad in their own unique ways, sorry
|
# ¿ Aug 14, 2019 22:22 |
|
something about the name "data dog" rubs me the wrong way and I can't get over it
|
# ¿ Nov 14, 2020 04:48 |
|
your nginx controller is probably running as a daemonset in hostnetwork mode or with hostports which means all you need to do is get the traffic to the nodes and you’re set (for ingress resources) needing non-http services is when you’d want something like metallb (although nodeports can do in a pinch)
|
# ¿ Nov 30, 2020 23:42 |
|
the reason I think that config management tools all suck is because they only ever solve one out of the two problems they are supposed to. one problem is orchestration, when you need control over how and when something happens in your infrastructure which is not going to be immutable. this is something that ansible provides. the other problem is immutable/stateless/unorchestrated configuration which is provided by tools like chef. as an operator sometimes i need one or the other, but all the tools I know of will only give me one and, reasonably, most groups only want to use one tool. i think it’s easier to try and pretend you have immutability in ansible than it is to shoehorn orchestration into chef, but I would love a tool that could give me both!
|
# ¿ Dec 12, 2020 23:19 |
|
I use alembic for a very basic data model and it is needs suiting
|
# ¿ Jan 27, 2021 06:38 |
|
docker is totally fine to use and likely you won’t notice any difference between runtimes except that docker will be easier to find resources online for
|
# ¿ Mar 20, 2021 02:27 |
|
container runtime is like the least interesting thing to spend time thinking about unless you are executing arbitrary untrusted code
|
# ¿ Mar 20, 2021 02:32 |
|
Methanar posted:a gross fragile single point of failure hack that works by MITMing traffic with iptables dnat rules. so, uh, just like kube-proxy?
|
# ¿ Mar 24, 2021 14:12 |
|
Methanar posted:kube-proxy isn't quite the same as its just a control plane component. If kube-proxy starts crashing, traffic still flows properly since the real datapath is through iptables which persist. ah, true
|
# ¿ Mar 25, 2021 10:43 |
|
Methanar posted:Mostly unrelated but we're looking at probably using Kube-router as our CNI with our own ASN that we peer with the main DC network. If your AZs are correspond to eg different subnets you will need to use Calico or something else that gives you control over the IPAM if I'm assuming correctly that you're going to be advertising pod IPs. Do you have a specific reason for advertising them? In general advertising pod IPs is a PITA and the overhead of encapsulation + SNAT has been negligible for the size of clusters I've worked on. Another thing to consider is that if you ever want to grow your IP range it may be more difficult with kube-router than with Calico, with Calico you just add another pool and you can delete pools that are empty. No experience with kube-router, but for us Calico has been a pretty positive experience. For somewhat similar reasons I would advise not advertising service IPs and instead getting a loadbalancer implementation. As long as you have >= 3 AZs per DC though having etcd span the AZs makes sense. Where you'd want a separate cluster per AZ is if you only had two or something
|
# ¿ Apr 7, 2021 02:49 |
|
Methanar posted:I've had 2 separate etcd fires two days in a row. just curious, are the etcds hosting just a single kube cluster? and I'm assuming they have local storage, not something over a network? only somewhat related, but I watched a talk a few months back from a guy who seemed to know what he was on about who said that freeable memory is basically a lie because IIRC and eg in this case, everything in the page cache including etcd pages would be considered freeable but are so hot the kernel would never (and should never) actually free that memory. also any program paged into memory is technically considered freeable, but again if it's hot the kernel won't do it because it would spend all its time freeing and then reloading the page. his conclusion was that there isn't (or wasn't at the time of his talk) a single good heuristic to tell how much memory is truly available for use at a given time on a system which was not reassuring.
|
# ¿ Apr 23, 2021 00:11 |
|
imo having your clusters dependent on broadcast domains is a bad idea. use an overlay or l3 routing if you can. is this a requirement for certain public clouds?
|
# ¿ Sep 22, 2021 05:37 |
|
luminalflux posted:Hi hello I do the release engineering among all other stuff I deal with, and we do CI/CD where if a merge to main passes tests, bucko it's getting deployed whether you want to or not. how long does this process take end to end?
|
# ¿ Oct 11, 2021 12:54 |
|
necrobobsledder posted:Historically for myself the most lax customers that are furthest behind in technology tend to be super important clients that are like 30% of the total company revenue so I have to do horrible things like scanning for their IP ranges advertised from their networks and make an nginx rule that offers certain ciphers only for them while everyone else gets what I meant to do. Freakin' enterprise I tell ya. cripes
|
# ¿ Oct 28, 2021 00:40 |
|
Your concern from the load perspective is how many time series you are creating. If the problems you're trying to solve with prometheus involve slicing and dicing at the customer level there's really no way around adding a unique per-customer tag to your metrics. So for every metric you push with those tags you just need to be aware of the fact that you're creating (customer count)x the number of time series. As long as you are judicious with which metrics you're tagging (ie not pushing the 1000s of metrics that might come from something like node exporter) per-customer, my guess is that you'll probably be fine.
|
# ¿ Nov 8, 2021 12:26 |
|
zokie posted:Maybe not the right thread but it hope it fits: we have another team managing a ES cluster we use for regular logging stuff and real user metrics. They started with just the one, but now they have setup a test cluster for their testing and a pre cluster that “is a prod environment for your non prod environments” and they want us to just have our prod environment logging &c to the prod cluster. It sounds like what you're asking is if your infrastructure team should have N > 1 instances of critical infrastructure, which to me seems in your best interest. In doing so, they should be taking steps to make sure this transition is as transparent as possible, meaning they should be providing a way for you to replicate and update whatever existing tooling you have across any new instances they decide to bring up.
|
# ¿ Nov 20, 2021 03:52 |
|
duck monster posted:So recently I started a new job that partially involves inhereting a giant Kubernetes cluster on DigitalOcean. I've never used Kubernetes so its all a massive learning curve. The process you should be looking at is kubelet. Looks like you can modify the kubelet config to have the kubelet come up with whatever node labels you want.
|
# ¿ Nov 22, 2021 08:33 |
|
they call them taints becvause they t'aint here and they t'aint there
|
# ¿ Nov 24, 2021 13:03 |
|
Methanar posted:
this was causing us strange issues for months before we realized it was enabled by default on ubuntu
|
# ¿ Dec 30, 2021 02:04 |
|
I loving wish it was 2015 where I work
|
# ¿ Jan 1, 2022 13:19 |
|
Dukes Mayo Clinic posted:“k8s without all the poo poo we don’t need” was all the pitch we needed to go hard on k3s in production. Time will tell. isn't it the same API, but the binary is just smaller and it supports SQL backends? lol c'mon man
|
# ¿ Jan 14, 2022 00:52 |
|
does faang interview SREs like they interview SWEs? like, do I need to start reviewing stuff like red black trees?
|
# ¿ Jan 26, 2022 13:52 |
|
|
# ¿ May 16, 2024 01:12 |
|
dehumanize yourself and face to Jenkins
|
# ¿ May 6, 2022 13:28 |