The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›3 »

Cidrick: Jun 10, 2001; Praise the siamese

I believe killall will match any process by its name - does your bash script have java in the name, by chance? Here's an example of what I mean:

code:

[matt@host ~]$ cat foo
#!/bin/bash
killall foo
ps -ef | grep foo
echo 'foo'
[matt@host ~]$ ./foo
Terminated

It's killing itself, since the script is named "foo" so it never actually gets to the second or third line of the script because killall is matching the name of the script.

Edit: Try looking at something like pgrep/pkill to do what you're looking for. Killall is a bit heavy-handed for most practical scenarios.

Cidrick fucked around with this message at 20:49 on Feb 25, 2015

# ¿ Feb 25, 2015 20:45

Adbot: ADBOT LOVES YOU

# ¿ May 10, 2024 10:35

Cidrick: Jun 10, 2001; Praise the siamese

What do some of you goons with bigger shops use for IPAM? In a previous life we moved from a hodgepodge of bind + dhcpd + ipplan to infoblox, and I loved it. I'm trying to get budget to get infoblox in my new shop since we use powerdns + dhcpd with no IPAM, but I'm getting pushback from management for cost reasons, so I've been tasked with exploring alternatives. Aside from solarwinds, I don't really know what else is out there, and at first glance it seems like it has a pretty heavy Microsoft slant, which we don't have much of.

Does anyone have suggestions on where else I should be looking? Has anyone come up with a FOSS solution that's easy to manage and scales to thousands of nodes without major pain points?

# ¿ Apr 23, 2015 22:18

Cidrick: Jun 10, 2001; Praise the siamese

Death Vomit Wizard posted:

Grepping Logs is Terrible
This article reminded me how much I need to learn about managing logs. I'm curious now to see how a very basic centralized log database might be set up, but my google searches are failing me. Care to point a noob in the right direction?

Basic option: (r)syslog to a centralized server via "*.* @remote-host.fqdn" on all your hosts just so you have a single place to grep -R.

Intermediate option: Logstash + Elasticsearch + Kibana - here's a getting started guide

Expensive option: Splunk. I absolutely love Splunk, I was a splunk admin for about 4 years and it was an absolutely fantastic product to deploy and maintain. The downside is that it's stupid expensive depending on your log volume. Last I looked it was roughly 100k per 100GB of indexed data per day. Granted, that's a one-time payment, but every time your needs grow you'll need to throw more money at the product.

Edit: gently caress yooouuu beaten so hard

# ¿ May 7, 2015 16:43

Cidrick: Jun 10, 2001; Praise the siamese

rocode posted:

Normally apt will warn you when you need to reboot by having a message along the lines of "Some packages require reboot for changes to take effect." after upgrading and every time you upgrade from then on. Unless you are using some KSplice shenanigans, you will need to reboot for kernel updates.

To my knowledge the only thing that requires a reboot to take effect is the kernel, but for patching stuff that's very deeply ingrained in the OS like glibc, you pretty much have to reboot as well. Otherwise you're stuck restarting every service on a box which is going to essentially take the host down anyway.

# ¿ May 7, 2015 18:59

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

You can kexec a new kernel, but hardly anybody does. ksplice works around this, but :oracle:, so kpatch is in mainline and should be ready soon (TM)

kpatch

I remember getting excited for ksplice and then it that excitement promptly waning because oracle bought it. I had no idea kpatch was in the works, so that's awesome.

# ¿ May 7, 2015 19:30

Cidrick: Jun 10, 2001; Praise the siamese

Thalagyrt posted:

Good advice

While I agree with all this, if you just want something simple to monitor RAID health, check out smartd. In addition to being able to monitor health of individual JBOD drives, it also supports all of the major RAID controller types (HP, LSI, software raid, etc) to monitor the physical disks that the OS typically won't see, and won't require you writing something of your own.

# ¿ May 8, 2015 19:07

Cidrick: Jun 10, 2001; Praise the siamese

outlier posted:

Yes. Otherwise, I'd immediately look at the laptop as being the issue.

Based on what you've said, this has to be an IP block from somewhere. It's possible your image came with denyhosts or fail2ban running by default, and you failed to log in too many times when you were first standing it up. It won't be iptables, but it's definitely something that sshd is checking after you've successfully connected and shared RSA fingerprints.

Check ~/.ssh/authorized_keys to see if there's anything in there. Check /etc/hosts.deny. Check /var/log/messages for anything after your failed login attempt. See if denyhosts or fail2ban are installed, and check their config files to see where they keep their failed login data to see if your home IP is in there.

# ¿ May 9, 2015 19:09

Cidrick: Jun 10, 2001; Praise the siamese

While I don't advocate using these for anything remotely resembling a proper linux environment, on the few AWS instances I maintain for side projects, I generally use the okean sino-korea IP blacklists which I fetch via cron.daily. It's not going to make your boxes ironclad by any stretch of the imagination, but it HAS cut down a great deal on comment spam and ssh brute force attempts.

Your mileage may vary.

# ¿ May 11, 2015 03:08

Cidrick: Jun 10, 2001; Praise the siamese

Martytoof posted:

Any thoughts on best practices on creating a partition vs using the raw disk for an lvm PV?

I typically create an 8e type partition on /dev/sdX1 and then pvcreate that, but I've been seeing some tutorials on just using the raw dev/sdX device rather than a subset partition.

I haven't read a compelling case for partitioning a disk and then pvcreating that. The only thing I read was that an idiot admin may mistakenly see an unpartitioned block device and reformat an in-use block device without thinking first, but that's more of a what-if scenario if you work with braindead co-workers.

I find it's much cleaner - especially when working with remote storage or a vmdk - to just pvcreate the raw block device. This is especially true when you go to resize it. No having to futz with kpartx to re-read the new partition boundaries, just echo 1 > rescan, pvscan, and lvextend.

# ¿ Aug 28, 2015 21:45

Cidrick: Jun 10, 2001; Praise the siamese

There's a stackexchange discussion that disagrees with me, though. However, I find most of the reasons to be... not very good, and certainly not worth the headache of having to deal with partitions when you're dicking with drive resizing.

# ¿ Aug 28, 2015 22:30

Cidrick: Jun 10, 2001; Praise the siamese

Yes, you can absolutely do a 20.172.in-addr.arpa and just put in the additional octet in the zonefile like your example. Remember the order is reversed, though - in your example, those would reverse-resolve to 172.22.1.10 and 172.22.1.20.

You can also put in $ORIGIN in the zonefile to add some logical separation to each of the separate subnets. Specifying the $ORIGIN variable basically means, for the following records, append this line to each entry. Like so:

code:

$ORIGIN 92.228.10.in-addr.arpa.
31		PTR	foohost01.stag.foo.net.
32		PTR	foohost02.stag.foo.net.
33		PTR	foohost03.stag.foo.net.
$ORIGIN 94.228.10.in-addr.arpa.
31		PTR	foohost01.dev.foo.net.
32		PTR	foohost02.dev.foo.net.
33		PTR	foohost03.dev.foo.net.

Would then become:

31.92.228.10.in-addr.arpa
32.92.228.10.in-addr.arpa
32.92.228.10.in-addr.arpa

31.94.228.10.in-addr.arpa
32.94.228.10.in-addr.arpa
32.94.228.10.in-addr.arpa

# ¿ Oct 21, 2015 19:20

Cidrick: Jun 10, 2001; Praise the siamese

We've recently been hit by a NetworkManager bug on our EL7 hosts running in production. Fortunately this didn't affect our customer-facing prod hosts running on Cloudstack, since I disabled NM on our EL7 cloud template that we use (for reasons that escape me right now, probably just to remove as much unneeded bloat as possible). However, it's disrupted a fair amount of our internal "prod" hosts that run a bunch of backend, non-customer-facing services that don't live in our internal cloud.

I know NetworkManager is default in EL7 now, but the kneejerk reaction has been to just blanket disable NM on all of our EL7 hosts since we don't technically "need" it and the legacy ifcfg scripts work just fine. However, I'm loathe to start making sweeping changes that go contrary to what RedHat mandates in their distributions. Defaults are usually that way for a very good reason. The trouble is, I'm having a hard time thinking of a good reason to keep it other than it's the default. Since these are all a mix of physical and virtual servers running in our data centers, the only additional benefit I know of is easier manipulation of interfaces using nmcli versus ifup/ifdown and brctl, which is hard to compare against network-disabling-bug-in-NM-breaking-prod-hosts.

Does anyone have any thoughts one way or another?

# ¿ Dec 7, 2015 17:19

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

Partly, I'd say that that bug appears to have only affected RHEL 7.0 (and 7.0.z), and that your production systems should probably be running at least 7.1, or 7.2, which is now released. dot-zero releases are always kind of buggy and unreliable. I realize that's not necessarily under your control, though.

It is and it isn't. Our cloud VMs have all been moved to 7.1 because we don't need a patching schedule for those - drain the node, destroy the VM, build a new one in its place, add it back to the VIP. No patching needed, just cycle the template. The non-cloud stuff is trickier because we don't have a proper patching schedule. It's something I know we need to work on.

I didn't know about the dbus stuff though - even if we're not actively using any of it, I'd rather keep it around in case there's a shiny new feature down the pipe later. If nothing else, I'll learn more about NetworkManager on servers, because most of how it operates is a black box to me (much like systemd is at this point). Turning something off because you don't understand it makes me cry. SELinux, anyone?

Thanks for the insight.

# ¿ Dec 7, 2015 18:34

Cidrick: Jun 10, 2001; Praise the siamese

nitrogen posted:

Anyone know anything about these that might know if there's some hidden setting I am missing?

What about checking the firmware revision of the OA of your chassis, and comparing the versions of iLO4 running on the two mismatched blades?

# ¿ Dec 11, 2015 16:36

Cidrick: Jun 10, 2001; Praise the siamese

nitrogen posted:

Well you inspired something. Blades that this works on are all in the same enclosure.

Problem is, blades where they DONT work are on an enclosure with the same OA firmware (1.7) and are on the same iLO version (2.03) where they DO work.

Erm. What kind of enclosures do you have? Granted I've only really dealt with the c7000s and all their variations, but 1.7 seems ancient to me. I think all of our production blades chassis are running 4.30.

Out of curiosity, if you SSH to the OA and do "show server status #" does it report the correct temperature? Or does it report the same erroneous one that ipmitool does?

# ¿ Dec 12, 2015 00:10

Cidrick: Jun 10, 2001; Praise the siamese

EvilRic posted:

We'd want to lock it down so that only applications on our LAN can send mail though it but wouldn't require smtp authentication.

Does anyone have any recommendations?

This is stupidly easy to do with postfix. There's really only two lines you'd need to edit in main.cf from a stock install

myhostname = some.fqdn.ofyourcompany.tld
mynetworks = 10.0.0.0/8 (or whatever your internal network is) <-- this is the key to allow your internal network to relay mail through this box without authenticating. You can also do a comma-separated list of individual IPs or hostnames if you'd prefer

Then I'd recommend setting up TLS support so you can send mail to third parties not in cleartext:

smtpd_tls_security_level = may
smtpd_tls_cert_file = /etc/pki/tls/certs/some-default.crt
smtpd_tls_key_file = /etc/pki/tls/certs/some-default.key
smtpd_use_tls = yes

More reading on securing email delivery with TLS is here.

service postfix start/systemstl start postfix, and you're essentially done, at least with the postfix configuration. At a bare minimum you'll want to set up SPF records for your email domain if you don't have them already, and if you do, add the public IP of your postfix box to that TXT record so that third parties won't blacklist you. You'll also want to create a PTR record for that public IP of your postfix box to match the "myhostname" field.

DKIM message signing is also nice, relatively easy, and free, but probably overkill for what you're planning on doing.

Cidrick fucked around with this message at 16:32 on Jan 6, 2016

# ¿ Jan 6, 2016 16:29

Cidrick: Jun 10, 2001; Praise the siamese

EvilRic posted:

Provided we have the SPF record so that it doesn't get flagged and the MX records point at the proper hosted server for delivery and replies, if I set it up with our domain name will it matter that the postfix server won't be the actual mail server for our domain?

Nope, not if all you're doing is only sending email and not receiving it.

One caveat: either don't set a "mydestination" variable in main.cf, or leave it as "localhost" or something. Otherwise, if you try to send email to "your" company's email domain from within your company's network, it will think that it's hosting mailboxes for your email domain, so email will get stuck in the postfix queues on your server and never try to do the MX record lookup to find out where that email should actually get delivered to.

Edit: I looked up the docs and the default is to have mydestination as "$myhostname", which should be fine to leave as-is.

# ¿ Jan 6, 2016 17:17

Cidrick: Jun 10, 2001; Praise the siamese

EvilRic posted:

For the TLS will a self signed cert do? I am assuming it is just used for the server to server encryption and provided it is valid the other servers won't mind if it is self signed?

To be honest, I've never tested using a self-signed cert. I imagine some MTAs are fine with it as long as the CN on the certificate matches your $myhostname field, but I've never tested this theory. Getting a cert from a Comodo or Geotrust reseller is like $10 for a year, so I've always just done that so I would never have to worry about the headache.

I'd be curious to know if anyone else here has tried using a self-signed cert and if it works or not.

Edit: Some googling suggests that you can get away with using free third party CAs like StartSSL or CAcert.org. Your mileage may vary.

Thanks Ants posted:

Get a Mandrill account or if you already use AWS then use SES.

I'd actually like to start playing with SES at some point, but I just haven't had a use case that would fit it yet. How does it handle connections and authentication to their relays?

Cidrick fucked around with this message at 18:18 on Jan 6, 2016

# ¿ Jan 6, 2016 18:15

Cidrick: Jun 10, 2001; Praise the siamese

I'm looking to create some sort of lightweight, pre-boot environment (preferably redhat-flavored since I know it the best) for a machine to pxe boot from. I don't want it to actually install an operating system - it's mostly going to be a small OS to do some pre-kickstart stuff, such as registering hardware info with our inventory system, adding itself to cobbler, putting its host record information into infoblox, setting up a RAID config, and so on.

We have something like that now, but it's sort of lovely - we mount an NFS mount exported from a single server in one of our data centers that has a full-blown EL6 installation on it, which isn't the tidiest way of doing things, nor is it lightweight at all. It also suffers the problem of locking when we have multiple hosts trying to boot at once.

Does anyone have any suggestions on a cleaner way to do this? I've been researching making a customized initrd, but that doesn't seem like quite the right way to be doing things. Should I convert a LiveCD installation to a network-bootable one?

# ¿ Jan 7, 2016 16:27

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

This is exactly what things like the foreman discovery image are for. You may need to build your own livecd and convert it, but I'd recommend starting there. It'd also be a great use for coreos or something else light

livecd-iso-to-pxeboot is fine (though I strongly recommend ipxe over http, since it's much, much faster)

Thanks, this is exactly the direction I was looking for.

And agreed on ipxe - I finally figured out how to get DHCP filters to work in Infoblox, so I can get chainloading to ipxe going. I had it set up at my last shop for booting ESXi installs, because trying to do that poo poo over tftp took loving forever.

# ¿ Jan 7, 2016 18:17

Cidrick: Jun 10, 2001; Praise the siamese

Docjowles posted:

Let's Encrypt is a thing now, too. There's basically no excuse for not using TLS on all the things these days.

Oh sweet. Now I can not give Comodo my $30 in march when my MTA is due for renewal.

# ¿ Jan 8, 2016 02:40

Cidrick: Jun 10, 2001; Praise the siamese

Is there anything I should be looking at besides keepalived if I want to ensure that the front end IPs for our haproxy farm are available across multiple hosts, say up to 10 of them?

In short, we want to have a cluster of physical hosts running haproxy (with a ruleset managed by bamboo that is watching marathon for new apps to launch on mesos) that will be behind a round-robin DNS record, but we want to make sure that if one or two hosts go offline for whatever reason, the front end VIPs for haproxy get brought online on other hosts in the cluster, and, ideally, spread those VIPs back across the cluster once the hosts are healthy.

I looked at corosync + pacemaker but it seems like we'd have to do shared storage if we wanted to do active/active, and it seems way way more complicated than what we actually need.

(Yes, I know the proper way to do this isn't round-robin DNS and keepalived, but we may or may not have hardware load balancers available to us, so I'm preparing a backup plan)

# ¿ Feb 11, 2016 15:22

Cidrick: Jun 10, 2001; Praise the siamese

BlackMK4 posted:

What am I doing wrong here?

Is there anything in your .profile or .bashrc that set up environmental variables that your script needs to run? Or perhaps something in an .s3cmd folder?

9 times out of 10 the reason a cron won't fire when it runs from the command line, it's an environmental variable not existing or the path to the executable isn't fully defined (which yours is, so it's not that)

Edit: Assuming this is your script, it looks like it's using Amazon's EC2 command line tools. And their docs seem to suggest that your AWS key and secret are kept in your .bashrc.

Edit2: Yay!

Cidrick fucked around with this message at 00:13 on Feb 13, 2016

# ¿ Feb 13, 2016 00:09

Cidrick: Jun 10, 2001; Praise the siamese

I've recently had to do kickstart a few hundred machines in one go (which is not a big deal with cobbler) but I'm finding that I'm burying my cobbler box if I try to do them more than a few at a time - or, more accurately, by putting a sleep statement in my for loop that does an ipmitool chassis power reset command. Without that sleep statement that makes the job take a couple of hours, a good amount of hosts will just plain give up on booting from network, and revert to booting from local disk. I'm about 80% certain that my bottleneck is tftpd which is... just a lovely protocol to begin with, so I'm trying to figure out how I could scale this out.

One option I've thought of is to chainload ipxe and pull down Linux boot images via HTTP, which would be awesome. However, in my (admittedly limited) time in playing with chainloading using Infoblox, I wasn't able to get it working, although plain dhcpd works just fine.

The other option I've thought of is to somehow load balance or scale out with multiple cobbler hosts running tftpd for me (or just multiple tftpd endpoints JUST for serving out boot images). Of course, load balancing UDP is probably trickier than I would be led to believe, or (if possible) using round-robin DNS to point to multiple tftpd endpoints.

Has anyone else solved this kind of problem and have any suggestions on where to look?

# ¿ Mar 7, 2016 19:52

Cidrick: Jun 10, 2001; Praise the siamese

Aquila posted:

I run a similar setup, but haven't reached such a scale. I've also played with iPXE and heard from friends that it's the future. How much have you poked at your tftpd and the server it runs on? It seems as though you could bind a bunch of IP's and run a tftpd to each, then have the dns server hand them out in some distributed fashion. I'm assuming here you have something like isc-dhcpd and the ability to set per machine dhcp options.

I played around for a bit this morning with iPXE for a few hours and I came to the conclusion that I can't use it for now. There are couple of documented cases that state that Emulex 10GB CNAs just aren't compatible with iPXE and don't appear to be planned for future support yet. In my tests with an HP G7 blade, about 1 in 3 netboots from the undionly.kpxe loader were successful, and the rest just hung.

Emulex 10GB NICs aren't a majority of my environment by any means, but it's a significant enough percentage of what I'm supporting that it's not worth pursuing right now. Which sucks, because now I'm back to tftpd.

I *did* successfully test that you can use hostnames in the next-server field (dhcp option 66). I'm not sure why I thought it could only be an IP address. This opens up lots of options like what you suggested about running multiple instances of tftpd on a single host and putting an RR-DNS A record in front of it, or even a UDP-based load balancer.

Food for thought. I'm not sure what I'm going to do yet.

Aquila posted:

I do ubuntu however so I'm not too familiar with cobbler, are you loading everything over tftpd or just the netboot image and then getting most everything else over (local?) http?

I'm using CentOS for the most part. For 99% of stuff it's just sending the boot images (vmlinuz and initrd.img from the distro release) and then running anaconda to pull the OS packages down from my repo server using normal HTTP. However, I am playing around with booting into a live OS environemnt for pre-install hardware discovery and configuration, which is a ~215mb initrd. It takes several minutes to send that over tftp

Aquila posted:

My linux-y question: can anyone with a haswell or haswell series v3 xeon run turbostat (msr) and let me know if it works (and what distro/release/kernel). I've been chasing down clock / performance / cstate / temperature issues and the ubuntu 12 turbostat is too old to understand haswell processors.

You mean msr-cpuid?

Cidrick fucked around with this message at 19:31 on Mar 8, 2016

# ¿ Mar 8, 2016 19:28

Cidrick: Jun 10, 2001; Praise the siamese

!!!!!!!!!!!!!!!!!

evol262 posted:

You can (and should) also add "rd.shell", which will give you a shell in the initrd if it fails, so you can look at /dev. Add "rd.break=pre-pivot" if you don't want to wait for the timeout.

jre posted:

This is awesome, thanks evol!

edit: Documentation also states that you can add rd.debug to print debug info to stdout, and optionally save it for later

quote:

1.Remove 'rhgb' and 'quiet' from the kernel command line
2. Add 'rd.shell' to the kernel command line. This will present a shell should dracut be unable to locate your root device
2. Add 'rd.shell rd.debug log_buf_len=1M' to the kernel command line so that dracut shell commands are printed as they are executed
4. The file /run/initramfs/rdsosreport.txt is generated, which contains all the logs and the output of all significant tools, which are mentioned later.

Cidrick fucked around with this message at 23:21 on Mar 14, 2016

# ¿ Mar 14, 2016 23:18

Cidrick: Jun 10, 2001; Praise the siamese

Boris Galerkin posted:

I'm trying to install TeamViewer in CentOS 7 and I'm doing this by downloading the rpm file from their website and running
code:
sudo yum install teamviewer_rpm_file_that_I_downloaded.rpm
yum searches for dependencies etc and tells me I need to install a list of dependencies that I already have installed (eg it's telling me it needs glibc but if I do 'yum list installed | grep glibc' then I see it's installed). Why is it doing this?

When it pulls in the dependencies, what arch is it listing for the package? I'm betting that teamviewer needs glibc.i686 and you probably have glibc.x86_64. This is fairly common.

# ¿ Mar 21, 2016 14:52

Cidrick: Jun 10, 2001; Praise the siamese

Boris Galerkin posted:

Oh you're right. TeamViewer is looking for i686 versions of everything and I have x86_64 versions. Is there no 64bit version of TeamViewer for Linux/CentOS?

It looks like they only package 32-bit for enterprise linux variants like CentOS.

# ¿ Mar 21, 2016 15:35

Cidrick: Jun 10, 2001; Praise the siamese

Docker is great when you have a big flat platform that you can just run whatever someone packages up and hands to you and not have to worry about supporting or handling break/fix on.

We have a pretty large mesos cluster running a bunch of of docker containers that our development and continuous delivery teams build via jenkins, and at the end of the jenkins job it publishes the container on our mesos platform. We support the OS and physical machines that power the mesos environment, and if someone's docker container breaks and fails a health check, it's destroyed and re-created, and our systems/platform team is off the hook for supporting anything within the container.

As long as you take some management steps - like shipping logs away to a centralized logging server or ELK stack or Splunk or what have you - It removes the administrivia of having to worry about teams needing to be able to log in and make changes to the app, or deploy a new app, they just push a new one and destroy the old one They don't need to log in to read logs when something breaks. This means you don't have to worry about patching a CentOS 6.6 box to CentOS 6.7, or set up and manage LDAP access so that you can SSH into a server for the application support teams.

We basically build the high-rise and they can do whatever they want in their condo, we don't give a poo poo. They only pick up the phone when the water main is broken. It's win win for everyone. We, as operations, don't have to maintain rigorous standards of what they can or can't run their application on and with. They, as developers, don't have to fight us on needing sudo access so that they can edit some file owned by root.

Yes, I'm a jaded IT operations guy. Why do you ask?

# ¿ Apr 15, 2016 19:31

Cidrick: Jun 10, 2001; Praise the siamese

Absolutely agreed, but it's a model that's worked well for us. I'm not saying it's for all shops, but I think it's important to illustrate one of Docker's biggest strengths. Most people (rightly) shrug when you can say "I can run firefox and spotify in a docker container" because that's not really practical, and missing the point about what makes containerization great.

If we're going to differentiate between :yaycloud:

and containerization, I will say that the toolset for shooting a container in the head and re-deploying it - if that's the model you're after - is much, much simpler with containerization than a standard cloud model, and I would argue that it scales better. You *don't* need an IP address for every container (yes, you can get away with this using private IPs in a cloud, but I've always hated cloud-based NAT). You *don't* need to register a new app with a software or hardware load-balancer or a service registry with containers if you use something like mesos-dns, mesos-lb, or Bamboo to do software load-balancing, because all that stuff is built-in and easy to get going with little effort. There's also built-in mechanisms if you use something like marathon do a rolling deployment to an existing application, so you can make live changes just by standing up replacements and tearing down the old ones. Nuke that CVE in a few minutes, even if it requires a reboot like a kernel or glibc vulnerability would.

You can also get away with a much, much lighter footprint per container as well, which helps with scaling. Lately we've been using Alpine Linux as the base docker container for building things, which is 7mb. Compared to the official CentOS 7 docker image which is just shy of 200mb, and the official Ubuntu image is a little less than that.

evol262 posted:

I'm not saying anything about your shop in particular, but "We basically build the high-rise and they can do whatever they want in their condo, we don't give a poo poo" is kind of antithetical to devops. I guess the question is "should the operations/admin team have visibility into what's going on in the condos"? And the answer is probably yes. This obviously doesn't mean you need to give developers access to the whole building, but do you trust them not to set up a roach motel in your building? Because I don't. You obviously don't need to know the specifics of what the apps are doing, how to troubleshoot them (other than shooting them in the head and waiting for a new one), etc, but developers are notoriously bad about keeping up on what's happening in the wider world -- something the operations team is good at.

I'm exaggerating, somewhat, to say that you shouldn't care about what devs are running in a container - because you should at least have some general guidelines of best practices for building an application, and maintain a good relationship with developers. You should absolutely care if someone tries to give you a 500GB docker container that reserves 20 cores and 48GB of memory, or whatever the docker equivalent of a roach motel is in this metaphor. You should encourage them to leave SELinux enabled. You should encourage them to not write their app on JDK 1.6. That kind of thing. They'll be more receptive to these sorts of changes if you all like and respect and communicate with one another, which is really all DevOps is.

But, to an extent, I feel like making developers put the code for their app in /usr/share instead of /opt, or putting @restart into the tomcat user's crontab to start the app instead of a systemd script, or other trivialities, is just putting up unnecessary barriers and dampening your relationship with developers. And it can even expend beyond what-folder-goes-where on a box. You want to use Ubuntu Server instead of CentOS because that's what you're comfortable with? Cool. Just tell us what your on-call rotation is so our NOC can page you when it breaks, and not my team. It makes bugs get fixed a lot quicker when their app support team is paged out because of lovely code

Edit: Sorry about all the :words:

about docker

Cidrick fucked around with this message at 21:38 on Apr 15, 2016

# ¿ Apr 15, 2016 21:36

Cidrick: Jun 10, 2001; Praise the siamese

Suspicious Dish posted:

do you want a job doing exactly that same thing but for a fancy startup running ubuntu 15.10

also we dont have logging yet because logging sucks afaict

Not unless you went back to RedHat which I'm assuming you didn't since otherwise you wouldn't be running Ubuntu 15 :spergin:

(I Have Opinions about Ubuntu in the enterprise Linux world but I'm not going to start that here)

jre posted:

Graphite has huge amount of support & tooling but has an awful file based storage format which limits how far you can scale it

Influx has better storage format but is new & flaky and doesn't have as much tooling support

Graphite is great if you can use a better storage backend driver for it that isn't whisper files, because that's a nightmare to try and scale. Influx is supposed to be good, but the Graphite Guy at my shop claimed it was too immature to use and there was developer drama which he thought would really hurt its long-term viability. Cyanite looks interesting since that uses Cassandra, but it's still super new and I haven't heard of any shops actually using it yet. If anyone ends up trying it out I would love to hear your experiences with it.

# ¿ Apr 17, 2016 13:59

Cidrick: Jun 10, 2001; Praise the siamese

So this is one of the funniest things I've run into in all my stupid years as a Linux admin

code:

root@cpegh0133:~# /bin/true
root@cpegh0133:~#
Broadcast message from root@cpegh0133
  (/dev/pts/0) at 11:03 ...

The system is going down for reboot NOW!
Connection to cpegh0133 closed by remote host.
Connection to cpegh0133 closed.

We were wondering why the mlocate cron job was rebooting a batch of boxes every day :v:

# ¿ Jun 13, 2016 17:08

Cidrick: Jun 10, 2001; Praise the siamese

Janitor Prime posted:

idgi can you explain the joke to me

Someone's malformed ansible script symlinked /bin/true to /sbin/reboot

Ergo any script or cronjob on the machine that invoked /bin/true would reboot the box

# ¿ Jun 13, 2016 18:22

Cidrick: Jun 10, 2001; Praise the siamese

Tigren posted:

How does that happen?

Can you post the stanza that caused that?

It was a bad solution using 'ln' to try and disable a service from invoking /sbin/reboot (which is another story entirely) by changing /sbin/reboot to symlink to /bin/true

Instead of 'ln -sf /bin/true /sbin/reboot' he ran 'ln -sf /sbin/reboot /bin/true'

# ¿ Jun 13, 2016 19:15

Cidrick: Jun 10, 2001; Praise the siamese

evol262 posted:

Use selinux. That covers 95% of basic "restrict the hell out of this" cases. Obviously you can still get hacked, but it's low-hanging fruit.

This, with the caveat that a lot of the nice wordpress automated features will require rolling a couple of selinux modules - unless you want to do that by hand, but wordpress's security patches are relatively frequent that it's a big enough hassle that it's easier to just automate.

Also you'll need a module to allow wordpress users to upload their own content. The last time I checked, wordpress itself doesn't publish selinux modules, but you can find some community-made ones that will help.

This article has a good amount of it covered:

https://francispereira.com/deploying-wordpress-with-selinux-enabled.html

You'll need to change a few things based on where you actually host the wordpress sites. I host a couple on the same ec2 instance, and they don't go in /var/www/html, but rather in user homedirs, so I have an semanage module file that looks like this that I've hacked together via trial and error:

code:

fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/blogs.dir(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/cache(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/plugins(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/themes(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/upgrade(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/updraft(/.*)?'
fcontext -a -t httpd_sys_rw_content_t '/home/user/www/wp-content/uploads(/.*)?'

That has covered everything I've come across so far. Once you have this in a file you can import it with 'sudo semanage import -f filename' which will then persist across reboots.

# ¿ Jul 6, 2016 16:51

Cidrick: Jun 10, 2001; Praise the siamese

Also related: wpscan is something you can automate fairly easy to run against each of your sites and notify you if there are any plugins or themes with known vulnerabilities. For purely academic reasons, it's probably worth running against your box that got owned just so you can see what sort of stuff you potentially missed.

# ¿ Jul 6, 2016 18:52

Cidrick: Jun 10, 2001; Praise the siamese

madmatt112 posted:

I am also aware that using linked clones will save me some trouble in some ways but I'm curious if anyone knows of a... nifty sysprep equivalent for Linux.

Depends on the format of your image, but virt-sysprep should be exactly what you're looking for. The only major operational difference between this and windows sysprep is that you run it against the disk image of a powered-off VM, not in a live VM right before shutdown.

# ¿ Jul 20, 2016 03:39

Cidrick: Jun 10, 2001; Praise the siamese

Is InfluxDB configured to accept collectd metrics? It won't natively, and I don't see a collectd output plugin to write to InfluxDB, just OpenTSDB and graphite, among other things.

Also check if InfluxDB is listening on all interfaces, or on your eth0 or equivalent, instead of the loopback interface; what's the output of "ss -tln?"

# ¿ Aug 12, 2016 23:54

Cidrick: Jun 10, 2001; Praise the siamese

Thermopyle posted:

InfluxDB has a native way to accept collectd metrics.

Well egg on my face. Ignore me I guess!

# ¿ Aug 13, 2016 00:05

Adbot: ADBOT LOVES YOU

# ¿ May 10, 2024 10:35

Cidrick: Jun 10, 2001; Praise the siamese

As an aside, I haven't touched Logstash since I started using fluentd in its place. It's much easier to work with in my opinion, and doesn't require having a JDK/JRE installed on your boxes just to parse logs.

fluentd can't natively ship logs to elasticsearch, but you can install a quick ruby gem to enable that for you. We've been using fluentd for a few months now to collect docker logs and ship to elasticsearch, and it's been flawless.

# ¿ Jun 8, 2017 17:58

The Something Awful Forums > Discussion > Serious Hardware/Software Crap > The Linux Questions Thread: a bunch of pitfalls, but technically it's possible

«‹›3 »