Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
I ran Ceph a few shops ago, and can concur, it requires a ton of operational overhead to manage.

Adbot
ADBOT LOVES YOU

Jesse Iceberg
Jan 7, 2012

What kind of problems did you guys encounter running Ceph?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

chutwig posted:

Unless you have a team ready and waiting to support Ceph, I would recommend contacting your local NetApp VAR.

trem_two posted:

Yeah. Don't run ceph yourselves. Our ops team was given a mandate to run ceph. It was a bad time. Very very bad time.

Gyshall posted:

I ran Ceph a few shops ago, and can concur, it requires a ton of operational overhead to manage.

Thanks guys, that's a pretty clear consensus and about as bad as I expected.

Hadn't heard of NetApp before. What's their elevator pitch minus the bullshit? Their "hybrid cloud" product page sounds like you're get a middleman's vendor lock-in in exchange for avoiding vendor lock-ins higher up the food chain.

Methanar
Sep 26, 2013

by the sex ghost
What's wrong with Ceph? I ran a very small cluster for kubernetes and had to screw around with it like once, and it was my own fault.

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Jesse Iceberg posted:

What kind of problems did you guys encounter running Ceph?

Methanar posted:

What's wrong with Ceph? I ran a very small cluster for kubernetes and had to screw around with it like once, and it was my own fault.

I think Ceph is a very interesting piece of software. I ran it at petabyte-scale for OpenStack and as object storage (though I left the team responsible before the object storage went full production). At large scales, you have to think about stuff that really doesn't matter in toy clusters, especially if the storage cluster is not completely homogeneous. Even small changes in the CRUSH map, like adding a few more OSDs, can trigger massive data migrations. You will also find out real fast who the small percentage of your user base is who completely kick the poo poo out of the system. It looks like there is experimental QoS support in Mimic now, which did not exist in the versions I used, but it's extremely susceptible to noisy neighbor issues still, and it's never going to be low-latency enough for a lot of applications. etcd on one of the Ceph clusters I used to admin takes about 3 orders of magnitude longer to fsync than on bare metal servers, and you definitely notice when your fsyncs take 250ms.

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.

chutwig posted:

I think Ceph is a very interesting piece of software. I ran it at petabyte-scale for OpenStack and as object storage (though I left the team responsible before the object storage went full production). At large scales, you have to think about stuff that really doesn't matter in toy clusters, especially if the storage cluster is not completely homogeneous. Even small changes in the CRUSH map, like adding a few more OSDs, can trigger massive data migrations. You will also find out real fast who the small percentage of your user base is who completely kick the poo poo out of the system. It looks like there is experimental QoS support in Mimic now, which did not exist in the versions I used, but it's extremely susceptible to noisy neighbor issues still, and it's never going to be low-latency enough for a lot of applications. etcd on one of the Ceph clusters I used to admin takes about 3 orders of magnitude longer to fsync than on bare metal servers, and you definitely notice when your fsyncs take 250ms.

Great post, and echoes my experience. My team was a team of 6 SRE plus myself, and we had problems with knowledge of distributed storage systems, as well as performance tuning. We inherited the implementation, which was built on JuJu and some other crap, but the learning curve was still a lot for the team where our efforts probably would have been spent elsewhere.

I love the idea of storage clustering in user land though. Ceph is a really impressive piece of tech.

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

NihilCredo posted:

Thanks guys, that's a pretty clear consensus and about as bad as I expected.

Hadn't heard of NetApp before. What's their elevator pitch minus the bullshit? Their "hybrid cloud" product page sounds like you're get a middleman's vendor lock-in in exchange for avoiding vendor lock-ins higher up the food chain.

NetApp's a storage vendor who's been around since about the time dinosaurs roamed the earth. Give them a call, or EMC, or Pure, or Nimble, or any one of the other storage vendors out there who will take no small amount of your dollars but will hopefully ease your support burden by giving you a platform that will probably do what you want it to do until you start pushing it hard, and then you'll find out what the phrase "WAFL inode exhaustion" means.

Nomnom Cookie
Aug 30, 2009



NihilCredo posted:

I'm looking into a storage abstraction for a product so it can run with no code changes in anything from a piddly under-the-table VM with a plain HDD to a MS/AWS/GC environment with that vendor's blob storage. Ideally, it would also handle a "I have a poor man's datacenter, N physical machines running orchestrated containers and some network storage (either a NAS or even each one with their HDD), please replicate my data as much as you are able without giving it to those icky cloud vendors" scenario, which I'm really hoping to avoid but cannot dismiss out-of-hand .

Is Ceph what I'm looking for or is there a less overkill solution? I've seen Storidge advertised around which seems simpler, but also very unproven.

I’m trying really hard to read your post in a way that doesn’t make my answer “the kernel’s file system layer” but I’m failing. Why would your app code need to know how the storage is provisioned?

spoon daddy
Aug 11, 2004
Who's your daddy?
College Slice
Funny Ceph story: My local vmware administrator didn’t want to pay for the license cost of VSAN and only had a boatload of localdisks (144 3TB drives across 4 esxi nodes). He recently came to me asking if running ceph instead was a good idea. It was hard to stop myself from asking if he was loving with me or not.

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

chutwig posted:

NetApp's a storage vendor who's been around since about the time dinosaurs roamed the earth. Give them a call, or EMC, or Pure, or Nimble, or any one of the other storage vendors out there who will take no small amount of your dollars but will hopefully ease your support burden by giving you a platform that will probably do what you want it to do until you start pushing it hard, and then you'll find out what the phrase "WAFL inode exhaustion" means.

I suppose it has nothing to do with delicious waffles? :ohdear:

Kevin Mitnick P.E. posted:

I’m trying really hard to read your post in a way that doesn’t make my answer “the kernel’s file system layer” but I’m failing. Why would your app code need to know how the storage is provisioned?

It is my understanding that mounting S3 or Azure Blob Storage as a filesystem is possible but neither simple not recommended, compared to using their proprietary REST APIs.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
On the other hand, for use cases where your needs are pretty fixed, you aren't making changes to the storage environment often, and you don't have any particularly stringent performance requirements for your workloads, Ceph is pretty low-touch. Use cloud as a cornerstone of your scaling strategy, but there's way worse ways to configure nearline disks you already own.

Vulture Culture fucked around with this message at 19:53 on Jul 2, 2019

Nomnom Cookie
Aug 30, 2009



NihilCredo posted:

I suppose it has nothing to do with delicious waffles? :ohdear:


It is my understanding that mounting S3 or Azure Blob Storage as a filesystem is possible but neither simple not recommended, compared to using their proprietary REST APIs.

It’s not really a worse idea than trying to pretend a single HDD is S3

chutwig
May 28, 2001

BURLAP SATCHEL OF CRACKERJACKS

Vulture Culture posted:

On the other hand, for use cases where your needs are pretty fixed, you aren't making changes to the storage environment often, and you don't have any particularly stringent performance requirements for your workloads, Ceph is pretty low-touch. Use cloud as a cornerstone of your scaling strategy, but there's way worse ways to configure nearline disks you already own.

I always caution people about Ceph not because I don't trust it, but because I've seen boredom-driven development result in too many instances of people thinking the toy Ceph or Kubernetes cluster they have on their MBP is going to scale up to the demands of production workloads and that it's always going to work fine forever. I think it's a wonderful piece of technology, but nobody wants to be caught in a situation where their Ceph cluster has gone down at 2 AM and you have no idea what to do to fix a corrupt monmap and the CTO is mumbling through an Ambien-induced stupor on the conference bridge. Distributed systems are hard to admin and debug, storage systems are very difficult to dislodge once in place, and combining the two can result in the perfect storm of Ultimate Fuckery. I think most people in here know this, and it's what I routinely try to impress upon my bright-eyed juniors fresh out of college.

Turbl
Nov 8, 2007


So I recently got hired into a devops position at a big corporation which is cool but it was a bit of a surprise because I interviewed for a basic entry level software developer position where they got all excited that I knew what "agile development" and "sprints" were and gave no indication that I was going into devops until right before I started and I basically knew nothing other than the general concept going into it. So I'm looking for any good websites/books that are good for learning core devops stuff, ranging from basic overviews to very technical and detailed. Probably stuff that isn't specific to a particular tool since I can just look up documentation for those easily enough. Also if there're any good devops related tech blog sort of things to follow. I just want a better understanding of the whole picture, the tool specific stuff I feel like I can just learn on the job.

I saw this Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation book mentioned very early on in the thread but that was from some years ago so I don't know if there's something more recent. Please give me any recommendations or advice.

Docjowles
Apr 9, 2009

Can you talk about your actual job duties and what technologies you’re expected to use day to day? “DevOps” has come to be such a broad term that it’s meaningless out of context.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

Turbl posted:

So I recently got hired into a devops position at a big corporation which is cool but it was a bit of a surprise because I interviewed for a basic entry level software developer position where they got all excited that I knew what "agile development" and "sprints" were and gave no indication that I was going into devops until right before I started and I basically knew nothing other than the general concept going into it. So I'm looking for any good websites/books that are good for learning core devops stuff, ranging from basic overviews to very technical and detailed. Probably stuff that isn't specific to a particular tool since I can just look up documentation for those easily enough. Also if there're any good devops related tech blog sort of things to follow. I just want a better understanding of the whole picture, the tool specific stuff I feel like I can just learn on the job.

I saw this Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation book mentioned very early on in the thread but that was from some years ago so I don't know if there's something more recent. Please give me any recommendations or advice.

Both you and your new employer are looking at devops wrong. It's not a job role, a particular set of tools, or a project management methodology. Devops is a cultural shift within an organization. Having a "devops position" almost always means that the company doesn't actually understand devops and is going to just have an additional devops silo beyond the usual deveopment, operations, and QA silos. If you want to really understand devops, go read about the devops culture shift and practices that make up devops organizations successful.

If the response to "a release is failing" is "that's Turbl's problem, they're the devops person", then your company is not doing devops.

Turbl
Nov 8, 2007


Docjowles posted:

Can you talk about your actual job duties and what technologies you’re expected to use day to day? “DevOps” has come to be such a broad term that it’s meaningless out of context.

There's a team of us and (some) people within the team "own" a particular tool and everyone on the team learns and manages the tools and works with the app teams to help teach them the CI/CD tools and support them when needed. Tools we own include version control tools (currently moving all app teams off subversion to bitbucket or TFS git), jenkins, ca ra, artifactory, TFS, upsource, and a few code scanning tools. There may be some other smaller tools I don't even know about yet. Also using docker and starting to use kubernetes. We're basically trying to get all applications working with automated deployment and improving the process. There are over 500 applications on ca ra right now and thousands of active jobs in jenkins so it's a lot. I know they're trying to get applications in line with the 12 factor methodology. If it sounds like I don't know what I'm talking about it's because I don't really yet.

Day-to-day basically seems like working on various projects (the subversion migration, kubernetes stuff that I don't know a lot about yet, streamlining processes), meeting with app teams, giving talks about newly implemented things, troubleshooting problems with any of our tools, and performing tool updates.

Turbl fucked around with this message at 01:00 on Jul 4, 2019

Nomnom Cookie
Aug 30, 2009



New Yorp New Yorp posted:

Both you and your new employer are looking at devops wrong. It's not a job role, a particular set of tools, or a project management methodology. Devops is a cultural shift within an organization. Having a "devops position" almost always means that the company doesn't actually understand devops and is going to just have an additional devops silo beyond the usual deveopment, operations, and QA silos. If you want to really understand devops, go read about the devops culture shift and practices that make up devops organizations successful.

If the response to "a release is failing" is "that's Turbl's problem, they're the devops person", then your company is not doing devops.

Or using the term devops wrong. Company I'm at now actually does devops, teams deploy and operate stuff they own, devs are in the oncall rotation, etc. But there's all the stuff used by everyone and owned by no one--CI server, deploy tooling, the basic EC2/k8s infra, glue stuff to make our SSO vendor integrate with things that doesn't support SAML, etc. Devops team owns all of it, though we could be called Infra or Commons. As you might imagine, it's sometimes unclear whether something falls into our jurisdiction or IT's.

JHVH-1
Jun 28, 2002
I’m the overlord of our cloud. The gatekeeper. The automation man. I am the devop.

Hadlock
Nov 9, 2004

Turbl posted:

the subversion migration

Yikes. Good luck sir, I just got done doing a similar migration, you will learn a lot about many things

fluppet
Feb 10, 2009

JHVH-1 posted:

I’m the overlord of our cloud. The gatekeeper. The automation man. I am the devop.

The job title cloud wrangler confuses vendors enought to make them leave you alone

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

JHVH-1 posted:

I’m the overlord of our cloud. The gatekeeper. The automation man. I am the devop.
shut up Brent

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug
I told the interns this year that my job is "computer janitor". I helpfully explained that I "Janitor the computers, you know, tidy up the cloud"

Cancelbot
Nov 22, 2006

Canceling spam since 1928

New Yorp New Yorp posted:

If the response to "a release is failing" is "that's Turbl's problem, they're the devops person", then your company is not doing devops.

Very much this. I'm our "DevOps Lead" with a few engineers, but our role is to get everyone to adopt the mindset through tooling, process, and culture changes. The aim is to shrink & eliminate the team as the capability & autonomy of the wider IT department grows. I suspect hope this will happen in 12-18 months if we keep doing things right.

Cancelbot fucked around with this message at 20:21 on Jul 4, 2019

LochNessMonster
Feb 3, 2005

I need about three fitty


Cancelbot posted:

Very much this. I'm our "DevOps Lead" with a few engineers, but our role is to get everyone to adopt the mindset through tooling, process, and culture changes. The aim is to shrink & eliminate the team as the capability & autonomy of the wider IT department grows. I suspect hope this will happen in 12-18 months if we keep doing things right.

Same. Our front end devs acted really surprised when I started to ask them what they tried to do solve their failing builds. I’m currently building a team that will improve the release automation as well as educate and help all developers to adopt the mindset of ‘you ship it, you run it’. We’ve got the shipping part down, the running it mindset is slowly getting there.

Too bad C level fired the 3 project teams that were actually doing this and helping all new teams to do the same. Reason: they have some major bonusses coming up after finalizing a major acquisition and need to keep the costs in check. This basically means the rest of the year they’ll Thanos snap half of all projects and no budgets can be changed anymore.

Ape Fist
Feb 23, 2007

Nowadays, you can do anything that you want; anal, oral, fisting, but you need to be wearing gloves, condoms, protection.
I'm running a Node App on CENTOS through cpanel with A2 Hosting using the Node App Setup thingy and I've gotten it all installed but when I actually go to run the Node start script I'm getting a weird memory error:

Run JS script

returncode: 244 stdout: stderr: npm ERR! path /home/coreaho2/app npm ERR! code ENOMEM npm ERR! errno -12 npm ERR! syscall scandir npm ERR! ENOMEM: not enough memory, scandir '/home/coreaho2/app' glob error { [Error: ENOMEM: not enough memory, scandir '/home/coreaho2/app'] errno: -12, code: 'ENOMEM', syscall: 'scandir', path: '/home/coreaho2/app' } npm ERR! A complete log of this run can be found in: npm ERR! /home/coreaho2/.npm/_logs/2019-07-04T21_36_59_706Z-debug.log

The thing is if I actually telnet into it and run the command manually it works fine?

Sorry for lack of additional info, phone posting.

Doc Hawkins
Jun 15, 2010

Dashing? But I'm not even moving!


i tell people i "do all the cloud poo poo"

Helianthus Annuus
Feb 21, 2006

can i touch your hand
Grimey Drawer

Ape Fist posted:

I'm running a Node App on CENTOS through cpanel with A2 Hosting using the Node App Setup thingy and I've gotten it all installed but when I actually go to run the Node start script I'm getting a weird memory error:

Run JS script

returncode: 244 stdout: stderr: npm ERR! path /home/coreaho2/app npm ERR! code ENOMEM npm ERR! errno -12 npm ERR! syscall scandir npm ERR! ENOMEM: not enough memory, scandir '/home/coreaho2/app' glob error { [Error: ENOMEM: not enough memory, scandir '/home/coreaho2/app'] errno: -12, code: 'ENOMEM', syscall: 'scandir', path: '/home/coreaho2/app' } npm ERR! A complete log of this run can be found in: npm ERR! /home/coreaho2/.npm/_logs/2019-07-04T21_36_59_706Z-debug.log

The thing is if I actually telnet into it and run the command manually it works fine?

Sorry for lack of additional info, phone posting.

i won’t ask why you are using cpanel, but i suspect your problem is specific to that

it’s real bad, and it’s about to get a lot more expensive

LochNessMonster
Feb 3, 2005

I need about three fitty


Ape Fist posted:

The thing is if I actually telnet into it and run the command manually it works fine?

Please tell me you don’t actually have TELNET enabled on a server in tyool 2019....

Ape Fist
Feb 23, 2007

Nowadays, you can do anything that you want; anal, oral, fisting, but you need to be wearing gloves, condoms, protection.

LochNessMonster posted:

Please tell me you don’t actually have TELNET enabled on a server in tyool 2019....

It's a secure SSH connection via putty any some I remote into something via CLI its Telnet to me sorry I'm just a developer I'm too stupid to know all the other words.

Helianthus Annuus
Feb 21, 2006

can i touch your hand
Grimey Drawer
i would search for the error on the cpanel forums, or go source diving to see how cpanel is starting your node app

there’s gotta be some cgroup, quota, limit or something

or maybe it’s using a different nodejs install from when you get on the CLI without cpanel

btw i’m curious now, and i wanna know why you have to use cpanel

Ape Fist
Feb 23, 2007

Nowadays, you can do anything that you want; anal, oral, fisting, but you need to be wearing gloves, condoms, protection.

Helianthus Annuus posted:

i would search for the error on the cpanel forums, or go source diving to see how cpanel is starting your node app

there’s gotta be some cgroup, quota, limit or something

or maybe it’s using a different nodejs install from when you get on the CLI without cpanel

btw i’m curious now, and i wanna know why you have to use cpanel

I wasn't using it at all until I realised I can't just start a mode process and detatch it or run nohup or forever or something without the provider eventually just terminating the process after a few hours of inactivity. It's a part of their (lovely) policy for some reason so they just strong arm you into using the really bad cpanel node app setup which is unintuitive and clearly doesn't work very well. I'm probably not going to stick around with these guys (A2 Hosting) but they were cheap and had a flat price cap.

Helianthus Annuus
Feb 21, 2006

can i touch your hand
Grimey Drawer
yikes you’re in a bad way. i wanna try to help you out

pm sent

Mr Shiny Pants
Nov 12, 2012

NihilCredo posted:

I'm looking into a storage abstraction for a product so it can run with no code changes in anything from a piddly under-the-table VM with a plain HDD to a MS/AWS/GC environment with that vendor's blob storage. Ideally, it would also handle a "I have a poor man's datacenter, N physical machines running orchestrated containers and some network storage (either a NAS or even each one with their HDD), please replicate my data as much as you are able without giving it to those icky cloud vendors" scenario, which I'm really hoping to avoid but cannot dismiss out-of-hand .

Is Ceph what I'm looking for or is there a less overkill solution? I've seen Storidge advertised around which seems simpler, but also very unproven.

The S3 API is pretty much supported everywhere it seems. You could take a look at Gluster.

Nomnom Cookie
Aug 30, 2009



Mr Shiny Pants posted:

The S3 API is pretty much supported everywhere it seems. You could take a look at Gluster.

Try that and your system will end up not working with real s3 unless the emulator vigorously enforces metadata inconsistency. S3 provides fewer guarantees than almost any storage system and stale reads are common

Mr Shiny Pants
Nov 12, 2012

Nomnom Cookie posted:

Try that and your system will end up not working with real s3 unless the emulator vigorously enforces metadata inconsistency. S3 provides fewer guarantees than almost any storage system and stale reads are common

So if you write it to the S3 spec it can only run better anywhere else? :)

Ape Fist
Feb 23, 2007

Nowadays, you can do anything that you want; anal, oral, fisting, but you need to be wearing gloves, condoms, protection.

Helianthus Annuus posted:

yikes you’re in a bad way. i wanna try to help you out

pm sent

when she say she gonna dm u but she don't

Nomnom Cookie
Aug 30, 2009



Mr Shiny Pants posted:

So if you write it to the S3 spec it can only run better anywhere else? :)

That's the problem, yeah. If your design is supporting a bunch of different backends all accessed through an S3 API, what are dev and test going to use? Most likely an S3 emulator written in nodejs that stores everything in /tmp. Customers hook the product up to S3 and then come to you with weird-looking failures, and dev has a real good time trying to diagnose. Once someone finally digs through the docs enough to find the page on all the ways S3 will screw with you for funsies, it's a major ops effort to build out an integration environment that's even capable of reproducing.

We already have something that abstracts over local disk, network disk, and cloud disk--it's called a filesystem. The differences between those backends are already great enough that the abstraction has worrying leaks. Don't make it worse by trying to add a blob store option.

Cancelbot
Nov 22, 2006

Canceling spam since 1928

If it's AWS + Linux, then surely the answer is EFS? It literally is a filesystem, i'm not great at linux but you just mount EFS on all the machines you want.

https://aws.amazon.com/efs/
https://docs.aws.amazon.com/efs/latest/ug/efs-onpremises.html <-- If your poo poo is at your office.

Even has nice things like lifecycle management, backups (with DataSync), and encryption at rest. You can use DataSync to huck it straight to S3. Windows can use either Storage Gateway or FSx.

If you really want this to span multi-cloud then it's going to be very difficult no matter which way you slice it.

Cancelbot fucked around with this message at 12:53 on Jul 9, 2019

Adbot
ADBOT LOVES YOU

zerofunk
Apr 24, 2004

Turbl posted:

I saw this Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation book mentioned very early on in the thread but that was from some years ago so I don't know if there's something more recent. Please give me any recommendations or advice.

I feel like that's probably still a solid recommendation. I'd also be interested if anyone else had other thoughts. I read it in bursts over a long stretch of time, so I don't remember all the details super well. I want to say it's a bit more general in that it deals with practices/process than specific tools. That's an advantage in that the advice probably stays relevant longer and it should be broadly applicable. On the other hand, it may not help much if you were starting from scratch trying to figure out which tools you should use.

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply