|
Had a client that was doing a datacenter move and IBM insisted they move the NetApp SAN that was bought through them or they wouldn't support it. They installed the SAN backwards in the rack at the data center.
|
# ? Apr 4, 2019 14:34 |
|
|
# ? May 26, 2024 14:04 |
|
siggy2021 posted:Sometimes you guys start talking and I have no idea what the gently caress is going on and at this point I'm convinced it's all one big joke I don't understand. There's an idea in devops that servers should all have configuration managed by one central source, rather than each box having a bespoke configuration that some developer or team of developers has tweaked for years, and cannot be replicated. This is sometimes referred to as cattle vs pets, with the idea being your servers should be disposable cattle you can just use interchangeably and not pets which you feel some affection for and can't be replaced. This idea is especially important when you start working at scale, on the cloud or in highly virtualized environments. Rancher is a tool that works on these principles to let you manage containerized infrastructure I have no idea what STEERing is.
|
# ? Apr 4, 2019 14:55 |
|
Steering committees are a meat space management concept. It's basically a group of people getting together to make bad decisions about things they don't understand.
|
# ? Apr 4, 2019 14:59 |
|
The Fool posted:Steering committees are a meat space management concept. Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity) Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results. When that container has served its purpose it gets murdered and all stateful info is discarded. This allows a brand new container tomorrow from the same source file which ensures the job runs identically every time with no weird quirks accumulated over time.
|
# ? Apr 4, 2019 15:35 |
|
The Fool posted:Steering committees are a meat space management concept. And in the end, everybody's balls are cut off?
|
# ? Apr 4, 2019 15:35 |
|
Judge Schnoopy posted:Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity)
|
# ? Apr 4, 2019 15:36 |
|
Vulture Culture posted:until someone changes the configuration on your orchestration system and the job mysteriously stops being scheduled for reasons that take you four days to figure out well, this is still IT, no getting around that. but it's better IT.
|
# ? Apr 4, 2019 15:38 |
|
Judge Schnoopy posted:Treating servers like cattle is also great when you need a thing to run once a day at a defined time. Instead of having my server on all the time doing nothing, waiting to be compromised or corrupted, I have a file. The file defines how to set up the server, how to install my poo poo, how to run the job, and where to post the results. This is a better strategy than the one my predecessor used, where the scheduled job lives on the desktop machine they remote desktop into. I don't even know where this machine physically is. I'm pretty sure nothing is monitoring its heartbeat, though!
|
# ? Apr 4, 2019 15:39 |
|
Antioch posted:We got hit by LockerGoga a couple weeks ago. Took us entirely by surprise, had most of our poo poo cryptoed to the nines before we even started getting alerts. Couple days of 12-14 hours, some tense restores and all is mostly well again. Plus a week of uninterrupted downtime was a great opportunity to upgrade almost everything, from server 2016 to the wiring in the switch closet to replacing an old firewall and cleaning up rules. We are currently on the 3rd straight day of recovering from ryuk. I’m on the network team so there’s luckily not a whole lot for me to do on this, but yeah day 1 was about 16 hours for me, day 2 was 12, we’ll see about today. Critical services are mostly restored so we’re getting there, but this has been wild. We paid out the rear end to a security contractor to help with this and they’ve been worth every penny to help plan and keep things on track. My manager put in like 36 hours straight from when it started lol No talk of how we will be compensated for the extra time yet but I’m sure it’ll just be some extra time off
|
# ? Apr 4, 2019 17:47 |
|
GreenNight posted:Yup. What bullshit you run into? SPSS. The process is just super vague and cryptic, and there's no document explaining how it works. A lot of their support pages just 404 since they've changed processes a few times, or sometimes they give you information that is not valid and send you on a wild goose chase. Once you finally have the license code applied to the server, you got to change environment variables to set up commuter licenses, logging, that sort of thing. This particular case, I couldn't find out why one module wasn't included in the license even though the base product and other extra modules were present. Turns out this authorization code has to be generated from an entirely different portion of the website! I ended up making a Sev 1 ticket and getting them to list step by step instructions because gently caress poking around on a garbage website for hours trying to divine out why Regression is so special and coveted that it can't be included with the main package. The server side of this, Sentinel RMS sucks too but at least there's documentation out there for it. Antioch posted:I'm going to make "I survived Crypto Crisis 2019" stickers and t-shirts. don't jinx yourself!
|
# ? Apr 4, 2019 18:16 |
|
TheCog posted:There's an idea in devops that servers should all have configuration managed by one central source, rather than each box having a bespoke configuration that some developer or team of developers has tweaked for years, and cannot be replicated. This is sometimes referred to as cattle vs pets, with the idea being your servers should be disposable cattle you can just use interchangeably and not pets which you feel some affection for and can't be replaced. This idea is especially important when you start working at scale, on the cloud or in highly virtualized environments. That is a good explanation and I understand more now. As someone that doesn't touch development stuff, and has never had a good reason to look into containers and whatnot, it all just sounds hilarious when you are throwing around terms like rancher and cattle. Really I just wanted to make a bad Steer pun.
|
# ? Apr 4, 2019 19:54 |
|
Judge Schnoopy posted:Yes. 'We want containers. What do we need, how do we do it, who's in charge of what, and how do we handle hiccups along the way? Let's meet every week and yell at each other for an hour until the project is finished (read: in perpetuity) What if my poo poo isn't stateless and also isn't http and also needs to receive traffic from the internet
|
# ? Apr 4, 2019 20:03 |
|
Methanar posted:What if my poo poo isn't stateless and also isn't http and also needs to receive traffic from the internet "is also great when you need a thing to run once a day at a defined time" clearly your use case isn't what I was talking about in that scenario. BUT! If your poo poo isn't stateless, it should publish state to a database so the server itself doesn't have to track anything. And your cluster of servers should sit behind a load balancer so any of them can die and be reborn without any interruption to receiving traffic from the internet. At that point you'd want them on a rolling cycle of being killed while maintaining enough nodes in the cluster to handle the workload, and no container should be over a day or two old.
|
# ? Apr 4, 2019 20:11 |
|
Judge Schnoopy posted:"is also great when you need a thing to run once a day at a defined time" How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate. Methanar fucked around with this message at 20:14 on Apr 4, 2019 |
# ? Apr 4, 2019 20:12 |
|
Methanar posted:How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot. UDP that requires hitting a unique endpoint as if it was stateful even though it's not stateful? That's a huge design flaw in whatever you're running and yeah, it's going to greatly limit what you can do with your infrastructure. That doesn't make containers bad, though.
|
# ? Apr 4, 2019 20:14 |
|
I think most people just deal with defending against containers like every other word published in CTO magazine or whatever that hearing it sets them off. Like I have no issue with “cloud” but every time someone asks why we aren’t doing it I twitch a little.
|
# ? Apr 4, 2019 20:35 |
|
Methanar posted:How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate. what is a udp "session"?
|
# ? Apr 4, 2019 21:38 |
|
12 rats tied together posted:what is a udp "session"? webrtc
|
# ? Apr 4, 2019 22:16 |
|
The idea of a single configuration entity is bad imo because as mentioned it becomes monolithic. The ideal model is one in which each application has the ability to bootstrap itself onto the infrastructure based on the same codebase as the application itself. The way I have moved at work is to small deploy scripts that are kicked off via CD but the docker model fits this as well. Methanar posted:How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate. You want GSLB.
|
# ? Apr 4, 2019 22:18 |
|
Home network question: Is there a better way to set up wake-on-lan than activating whatever setting it is on my computer and then memorizing the MAC address, then running start-computer from my other computer? I'm doing some resource-intensive stuff on my desktop but would like to be able to work on it from my laptop out of the room. Works fine through Chrome Remote Desktop until I stop for a while, the computer falls asleep, and I have to get up and go into the other room to wake it back up. I'm open to using traditional RDP, it's just that Chrome is super convenient.
|
# ? Apr 4, 2019 22:29 |
|
Get a cordless mouse and move it into the room you're in, wiggle it when you need to wake the PC.
|
# ? Apr 4, 2019 22:43 |
|
Don't turn your PC off or let it sleep?
|
# ? Apr 4, 2019 22:49 |
|
12 rats tied together posted:what is a udp "session"? A miserable little pile of packets E: missed this but abigserve posted:You want GSLB. No no no no no no. Nooooo nullfunction fucked around with this message at 23:11 on Apr 4, 2019 |
# ? Apr 4, 2019 23:09 |
|
nullfunction posted:A miserable little pile of packets ? Expand? GSLB is extremely widely deployed and it works very well as long as you aren't looking for highly granular load balancing OR if your clients are extremely geo-diverse (I think this would be Methanars use case)
|
# ? Apr 4, 2019 23:29 |
|
abigserve posted:The idea of a single configuration entity is bad imo because as mentioned it becomes monolithic. Docker has this concept of images 'using' some image that's already there. So if I have a web app that will run a specific config, I start my docker file saying "using lamp.stack.image" and then tell it how to download and install my web app configs. Somebody else can also say "using lamp.stack.image" and apply their own config, or they can say "using new.app.config" and bootstrap from where I left off. This provides a ton of flexible forking of images to not only get exactly what you want, but to borrow as much code as possible. It definitely keeps you from getting monolithic images that everybody uses as the 'gold standard', containing poo poo-tons of bloat your project won't ever touch.
|
# ? Apr 5, 2019 01:14 |
|
nullfunction posted:A miserable little pile of packets perhaps the same could be said of ALL protocols
|
# ? Apr 5, 2019 01:26 |
|
abigserve posted:? Expand? There's something a little more subtle that we ran into that caused a lot of problems. Keep in mind that GSLB is really just DNS at its core. Your users' ISP probably handles resolving their DNS requests, but if their ISP has a geodistributed DNS infrastructure, or relies on a third party to resolve their requests who has DNS servers in many locations, your DNS requests can appear to be coming from all over the place. GSLB sees this and responds with a different IP address -- but this often happens in the middle of your session! Did you know that background DNS requests fire off while you're connected to a website, every few seconds? They nuked chrome://net-internals in more recent versions, but it was extremely plain to see a few versions ago. Sometimes DNS traffic will just start coming out of a server across the country, and if your application can't accommodate that, you're going to have a bad time. If you have long-running sessions (6 hours definitely counts) your load balancers have to maintain that state in memory. That's a lot of hardware if you have a high-volume service. If your application is completely stateless (unlikely, state is pretty much unavoidable for nontrivial use cases) or you've found a way to share state across datacenters (can be done, but is often incredibly expensive and/or slow) then GSLB might be a solution for you. I spent a good chunk of 2017 and 2018 making our application work with GSLB without randomly dropping sessions due to cross-DC chatter, and it's still not perfect. It probably won't ever be, but we got it down about an 0.04% failure rate. "Granular" in our case was US-East vs. US-West. Our application is performance-sensitive and sharing state between DCs isn't possible within our performance constraints at the scale we operate at. We even tried implementing edns-client-subnet extensions without much improvement. I know a little bit about Methanar's use case from this thread and PMs and I can say pretty confidently that GSLB is not the right tool for that job. GSLB works for some scenarios, but it's far from a silver bullet.
|
# ? Apr 5, 2019 01:35 |
|
Methanar posted:lol how do you feel about getting internet traffic into the cluster I don't think we'll ever be getting direct traffic, there will always be some perimeter. Ultimately customer facing products will be running in the environments we build.
|
# ? Apr 5, 2019 01:49 |
|
nullfunction posted:GSLB sees this and responds with a different IP address -- but this often happens in the middle of your session! Did you know that background DNS requests fire off while you're connected to a website, every few seconds? They nuked chrome://net-internals in more recent versions, but it was extremely plain to see a few versions ago. Sometimes DNS traffic will just start coming out of a server across the country, and if your application can't accommodate that, you're going to have a bad time I don't think any of that is right and I couldn't replicate it in wireshark... I'm not saying GSLB is a silver bullet but for long running UDP applications like webrtc that presumably front-load all of the DNS-lookups I think that's a pretty good fit?
|
# ? Apr 5, 2019 05:15 |
|
Judge Schnoopy posted:At that point you'd want them on a rolling cycle of being killed while maintaining enough nodes in the cluster to handle the workload, and no container should be over a day or two old. Uptime Epic Fail Compilation
|
# ? Apr 5, 2019 05:47 |
|
I was going to say do DSR with a really intelligent hashing algorithm that you will probably have to implement yourself as an nginx module. Several hundred sessions per backend doesn't sound high enough to make DSR a requirement, though! what, specifically, is breaking? how is this any different from load balancing a bunch of TCP connections?
|
# ? Apr 5, 2019 06:25 |
|
Methanar posted:How do I use a load balancer on-prem with non-http UDP traffic and also it matters which backend the internet originating traffic goes to a lot and also any of the several hundred sessions assigned to a backend are extremely long-lived and may take 6+ hours to naturally terminate. Very carefully.
|
# ? Apr 5, 2019 07:07 |
|
CloFan posted:don't jinx yourself! Make them "I survived Crypto Crisis 2019 1.0" and they're evergreen. 22 Eargesplitten posted:Home network question: Is there a better way to set up wake-on-lan than activating whatever setting it is on my computer and then memorizing the MAC address, then running start-computer from my other computer? Make the computer do the memorization and save it as a .ps1 file. Just run that whenever you need to wake up the remote system. Our PortsDB is full of bad data so we can't look up a wall jack by IP address, but by god we can turn a hostname into a MAC address ! I have someone else's script to look up MACs from hostname, and I wrote a script to take that output and send WoL packets to a list of hostnames. Handy for late night deployments.
|
# ? Apr 5, 2019 07:09 |
|
abigserve posted:I don't think any of that is right and I couldn't replicate it in wireshark... I worked on this from about the Chrome 58 to 69 (nice) timeframe, and maybe the behavior has changed, as I can't replicate it now either. It could have been tied to TTLs, ours have to be quite low in order to support fast failover to meet our SLAs. I ended up writing scripts to scrape tens of millions of rows of application audit logs looking for the crosstalk, pulling user-agent strings, tying the IP addresses to an ISP, and then geolocating them in the course of tracking this down. It was the worst on Chromium-based browsers and we also found a strong correlation to satellite internet connections, and for some larger customers that ran their own DNS infrastructure we caught several misconfigurations on their end in the course of investigating all of this. It was A Thing for a long time and I'll admit I'm a little bit apprehensive when GSLB-anything comes up because of it. I know "it's DNS" is kind of the poo poo answer people sometimes give when they have nothing else, but after a deep dive that's... what we found. It was DNS. Your success with GSLB is also likely tied to your vendor's implementation, ours being not great. We ended up switching to an external DNS provider to handle the traffic direction which allowed us to get our failure rate to where it is today (orders of magnitude better than the first time around). If your application can tolerate it, GSLB might work for you. There are a lot of applications that can't, and you will really need to know your stack inside and out (especially how state is handled) to evaluate it properly.
|
# ? Apr 5, 2019 07:32 |
|
nullfunction posted:There's something a little more subtle that we ran into that caused a lot of problems. Keep in mind that GSLB is really just DNS at its core. I work on appliances that do GSLB. There's a reason I (and many other people I know who work on the same appliance) don't use response time based metrics.
|
# ? Apr 5, 2019 12:30 |
|
Our core switch was up for 4 years, 17 Weeks before I rebooted it after getting hired.
|
# ? Apr 5, 2019 15:55 |
|
hihifellow posted:I work on appliances that do GSLB. There's a reason I (and many other people I know who work on the same appliance) don't use response time based metrics. Out of curiosity, is there a better solution? Here's our scenario: We have a pair of datacenters and want to ensure that we balance the load roughly equally between the two, but once a customer establishes a connection to one, they need to stay there for the life of their session or until their environment in that datacenter becomes unhealthy, whichever comes first. Our environments are sized so that we can take a full outage of one without overloading the other as we'd like to get to the point where we can fully shut down one DC for maintenance with minimal customer impact, but Active/Active is a selling point. Right now we're using geolocation to direct traffic but it presents problems when a customer's user base is right on the arbitrary lines we've drawn, so our guys are having to create a bunch of custom rules in our geolocation provider to deal with those. The user base is a mixture of desktop and mobile and the mobile users tend to move around a lot. It's not possible to share running application state between DCs due to the way our backend works, and when a failure is detected we need people to fail over as quickly as possible, 5 minutes is about the longest we can wait -- our SLAs don't leave a ton of wiggle room. I didn't design our current solution, my background is in networking but I'm not the one touching the hardware directly. I know we're moving to F5 LBs in the coming months, if that makes a difference. Feel free to PM if you'd rather. Someday I hope to be able to post more of the gory details as some of it is really interesting, but I like my job too much to do that now.
|
# ? Apr 5, 2019 17:47 |
|
Are source/destination hashes for load balancing an option instead of round robin? Or isnt there something called sticky sessions. My load balancing knowledge is a little ancient
|
# ? Apr 5, 2019 17:52 |
|
The first thing we tried was sectioning off the IP space and directing traffic based on source, but we found that it disproportionately affected mobile as people would go on and off WiFi and the change in address would cause them to hit the other datacenter and kick them because their session wasn't valid in that DC. It also lead to a fairly lopsided load pattern which was undesirable. Using a cookie to tell which DC they started in was one of my suggestions, but I was told there wasn't a good way to reroute their traffic along our edge to the correct DC if they came in through the wrong one. Probably a vendor-specific limitation, but I don't know the details there.
|
# ? Apr 5, 2019 18:04 |
|
|
# ? May 26, 2024 14:04 |
|
I haven't worked with F5's before but they should be able to load balance by session count with persistence linked to something like source IP. I know they compete with Netscalers, which is what I do work on and I know they can do exactly that.
|
# ? Apr 6, 2019 02:56 |