Register a SA Forums Account here!
JOINING THE SA FORUMS WILL REMOVE THIS BIG AD, THE ANNOYING UNDERLINED ADS, AND STUPID INTERSTITIAL ADS!!!

You can: log in, read the tech support FAQ, or request your lost password. This dumb message (and those ads) will appear on every screen until you register! Get rid of this crap by registering your own SA Forums Account and joining roughly 150,000 Goons, for the one-time price of $9.95! We charge money because it costs us money per month for bills, and since we don't believe in showing ads to our users, we try to make the money back through forum registrations.
 
  • Post
  • Reply
FamDav
Mar 29, 2008

necrobobsledder posted:

Last I could tell from the outside, half the CloudFormation blogposts and engineers appeared to be based out of India which makes me wonder if it's strategically important enough for AWS to put more higher profile engineers on them. Doubtful you'd see that happen to IAM, in contrast.

The CloudFormation team is based in Seattle and Vancouver. Are you basing this off names?

New Yorp New Yorp posted:

Containers / Docker Compose have nothing to do with Azure.

Don't prioritize integration tests, prioritize unit tests. Unit tests verify correct behavior of units of code (classes, methods, etc). Integration tests verify that the correctly-working units of code can communicate to other correctly-working units of code (service A can talk to service B). Both serve an important purpose, but the bulk of your test effort should go into unit tests.

Counterpoint: your customers don't use units, they use your application. Write unit tests so that you better define constraints between parts of your codebase, but unit testing actually shows how you what customers are experiencing. Make sure you make integration testing easy and consistent for your team so they write many of them.

Adbot
ADBOT LOVES YOU

Nomnom Cookie
Aug 30, 2009



Spring Heeled Jack posted:

To anyone running k8s, how are you handling database migrations with deployments? We’re a .net shop and our pipelines (azure devops) are triggered to build a container for the migration and one for the service and run them in that order (the migration container runs as a job).

We’ve been running this way for a while without issue but it seems like a better option may be to run it as a initcontainer with the service deployment. We have permissions separated out so the job runs as a user with scheme update rights and the service can only r/w data.

On another note, how is everyone handling deployments? We’re reviewing our setup to make sure we’re doing things efficiently and while we don’t have any real issues, we know there’s probably something better we could be doing. Currently we write out manifests in our repo and have the pipeline download/do a variable replacement for the container tag and some other things. For secrets I have an appsettings file that gets transformed and turned into a cluster secret/mounted as a volume on the pod.

We’re currently reviewing azure managed identities to lessen the amount of secrets we actually need to pipe into these files.

Helm still seems like a hassle as it looks like we would need 15 separate charts to cover all of our services.

our deployment tooling centers is a bunch of Python that consumes a YAML written in an internal schema and produces all our deployment artifacts. k8s YAMLs and then Datadog monitors and dashboards mainly. Sounds like overkill but we have like 8 teams with dozens of services between them, and getting them all up to speed on k8s would be a horrible pain. The schema dev uses is deliberately limited in some ways, so e.g. they get one Deployment per microservice and no funny poo poo with StatefulSets or God forbid creating pods directly. But then we've put a lot of effort into defining a rich schema for SLOs and providing tool support for more of the esoteric k8s features like HPA and PDB, so that it is easier in fact to go along with conventions and deploy your service with the provided tooling.

I am a huge fan of the restricted schema for defining deployments. It helps a ton with reducing boilerplate and also with reducing variance between how different teams build and operate their services.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost

FamDav posted:

The CloudFormation team is based in Seattle and Vancouver. Are you basing this off names?
Not at all. I saw several blogposts that prominently featured CloudFormation team members clearly stated based in India and only India while many other teams' blog posts come from so many other places around the US, Europe, South America, etc. From an optics standpoint, if the few bits of engineering insights the public sees are from traditionally off-shored areas instead of diverse regions, it will have a negative perception even if only once or twice without making regular, impactful contact with customers. I say this having worked with an offshore team I empowered and encouraged to have as important of a role as the local team by praising them when I could but found out later that our clients got the impression we had offshored substantially and that even myself had been reassigned possibly.

Qtotonibudinibudet
Nov 7, 2011



Omich poluyobok, skazhi ty narkoman? ya prosto tozhe gde to tam zhivu, mogli by vmeste uyobyvat' narkotiki

Spring Heeled Jack posted:

To anyone running k8s, how are you handling database migrations with deployments? We’re a .net shop and our pipelines (azure devops) are triggered to build a container for the migration and one for the service and run them in that order (the migration container runs as a job).

We’ve been running this way for a while without issue but it seems like a better option may be to run it as a initcontainer with the service deployment. We have permissions separated out so the job runs as a user with scheme update rights and the service can only r/w data.

...

Helm still seems like a hassle as it looks like we would need 15 separate charts to cover all of our services.

We run jobs before rolling out the new version pods. The pods have initContainers also but they're more of a "is the database available and properly migrated" check. Keep in mind that those run on each pod, which is fine if your migrations NOOP if they see they've run aleady, but if they don't you'll probably want the job.

Helm can be as good or as bad as you want it, but with the removal of Tiller I can't see much reason to hate on it. Sure, you can go full crazytown on the templating, and we do because we're building a chart as a vendor to support many different configurations, but it's perfectly happy to just render a static manifest as a "template" while still allowing you to use the various lifecycle hooks. Those you can use to automatically run the migration job and then only spawn the new version pods after it completes.

12 rats tied together
Sep 7, 2006

Nomnom Cookie posted:

[...] But then we've put a lot of effort into defining a rich schema for SLOs and providing tool support for more of the esoteric k8s features like HPA and PDB, so that it is easier in fact to go along with conventions and deploy your service with the provided tooling.

I am a huge fan of the restricted schema for defining deployments. It helps a ton with reducing boilerplate and also with reducing variance between how different teams build and operate their services.

It's kind of sad how autoscaling and disruption budgets suddenly become an esoteric feature when you move from, say, EC2 to kubernetes. Not a dig on your developers, mine also all read the same medium post that tells you to create StatefulSets for some loving reason, but in a lot of ways the shift to k8s can be a significant step backwards in functionality (assuming you were actually good at operating your previous provider(s)). From a product engineering perspective I haven't built anything on k8s in the past 3 years that isn't just a shittier version of AWS::AutoScaling::AutoScalingGroup.

I think it's absolutely the right call to either build some kind of abstraction that your dev teams just feed structured inputs to, or to run embedded infrastructure engineers in every dev team. However, if you're speccing out a k8s abstraction, the AWS::ECS CloudFormation namespace is right there already, so IMHO it ends up being a huge wheel reinventing exercise. I also don't know if this is just a thing at my current employer or not, but every time someone brings up building an abstraction for something immediately like 10 people appear and start bikeshedding UI tech and web frameworks and poo poo, it's extremely tedious.

CMYK BLYAT! posted:

The pods have initContainers also but they're more of a "is the database available and properly migrated" check. Keep in mind that those run on each pod, which is fine if your migrations NOOP if they see they've run aleady, but if they don't you'll probably want the job.
Since we're using ansible to run everything in k8s (for some applications), we have preflight assertion suites that poke around the environment. Occasionally we will run them in the middle of tasks too, for example while mutating kafka brokers we will have ansible make some changes and then wait for various cluster metrics (mainly under-replicated partitions, in-sync replicas) to stabilize before moving onto the next node.

In particular here we have a task that maintains the state of a postgres db for airflow, which includes bootstrapping, ensuring migrations are properly ran, ensuring our additions to it (CI user, CI user permissions, etc) are in the correct state, and stuff like that. The same task actually creates the database in AWS too, which lets us create extra airflow environments automatically on merges to master.

To me it seems like running database migrations should be part of the application deploy, and running an isolated stateful database in k8s and spinning up containers just to run some SQL feels unnecessarily complicated for a fairly solved problem, but OP did say .net shop, so :v:

Matt Zerella
Oct 7, 2002

Norris'es are back baby. It's good again. Awoouu (fox Howl)
Ami reading this right that you can't stand up Amazon Workspaces with terraform? Just the VPC and the Directory server?

vanity slug
Jul 20, 2010

Matt Zerella posted:

Ami reading this right that you can't stand up Amazon Workspaces with terraform? Just the VPC and the Directory server?

Yeah, because a lack of API support from AWS: https://github.com/terraform-providers/terraform-provider-aws/issues/434

e: Oh, there's progress: https://github.com/terraform-providers/terraform-provider-aws/pull/11608

Matt Zerella
Oct 7, 2002

Norris'es are back baby. It's good again. Awoouu (fox Howl)
But I can stand up the VPC for it? Or is that not a good idea if I'm plunking manually created stuff into it?

vanity slug
Jul 20, 2010

Matt Zerella posted:

But I can stand up the VPC for it? Or is that not a good idea if I'm plunking manually created stuff into it?

The VPC resources you create in Terraform don't care about what you put in it, and it'd create a good base to import the other Workspaces resources once it's supported. I'd go for it.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

Bhodi posted:

I'm stuck in this hell right now, trying to find the best way to build a multi-platform file using yum_repository from a yaml structure but apparently you can't have an inner and outer loop in ansible that traverses a tree with different size leaves without include_task? Because the jinja2 product filter only let you reduce not reference? Like, WTF?

I'm starting in on ansible hard for the first time and yeah maybe i should just make a custom filter i guess? It shouldn't be this hard to parse and be able to make the index names 7.1_repo1 7.1_repo2 and 7.2_repo3:
pre:
repos:
 rhel7:
   "7.1":
   - name: repo1
     baseurl: http://whatever1
   - name: repo2
     baseurl: http://whatever2
   "7.2":
   - name: repo3
     baseurl: http://whatever3
  
Is it difficult to generate this via Jinja templates rather than doing all the weirdo machinations in Ansible YAML?

Vulture Culture fucked around with this message at 13:50 on May 9, 2020

Bhodi
Dec 9, 2007

Oh, it's just a cat.
Pillbug

Vulture Culture posted:

Is it difficult to generate this via Jinja templates rather than doing all the weirdo machinations in Ansible YAML?
Mostly the directive to use yum_repository and to expand preexisting code. It was an interesting constraint that grew my knowledge, so a win I guess? Even if there were better ways of doing it, it's done now.

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
The next time you have to do something super wacky like this to generate multiple resources from bonzo logic, look into creating an action plugin. I've got a few good examples from modernizing a neglected repo in the past couple weeks

12 rats tied together
Sep 7, 2006

Deferring to the group_vars plugin as often as possible is a great way to avoid scenarios like this (in particular, it's unlikely a system will ever be both rhel 7.1 and rhel 7.2 at the same time), but also in general, ansible-playbook loops are much easier if you keep your nested types the same type all the way down:

code:
rhel:
  "7.1":
    repo1: whatever1
    repo2: whatever2
  "7.3":
    repo1: whatever3
Your loop ends up as basically: with_dict: "{{ rhel[myversion] }}" -- item.value gets you the repo uri, item.key gets you the local name you have for it. If you have nested dicts:

code:
rhel:
  "7.1":
    repo1:
     baseurl: whatever1
     otherproperty: value2
You can still with_dict on {{ rhel[myversion] }} but item.key gets you "repo1" and item.value gets you the dictionary, so item.value.baseurl will pull your uri. A lot of the times I'm dealing with some horseshit like this though, I'm getting the data structure from some external system, and usually in those cases it makes sense to reach for a plugin.

e: Basically, however counter-intuitive it may seem, there's no reason to ever intentionally track variables in a list inside ansible. The only times you should do this are when you really need strict ordering plus some kind of index math tomfoolery, and hopefully those times never occur.

12 rats tied together fucked around with this message at 17:40 on May 9, 2020

Asleep Style
Oct 20, 2010

I'm trying to set up a basic CI/CD pipeline using GitLab CI/CD and I'm running into some weird errors.

The project is just a static website served from an nginx docker container. I'm using gitlab.com, so this is using a shared docker runner. An additional wrinkle is that the VPS I'm deploying to is running centos8, so I'm building with docker and deploying with podman.

Here's the complete .gitlab-ci.yml:
code:
# runner image is based on alpine linux
image: docker:19.03.0

variables:
    DOCKER_DRIVER: overlay2
    # Create the certificates inside this directory for both the server
    # and client. The certificates used by the client will be created in
    # /certs/client so we only need to share this directory with the
    # volume mount in `config.toml`.
    DOCKER_TLS_CERTDIR: "/certs"

services:
    - docker:19.03.0-dind

stages:
    - build
    - deploy

before_script:
    - docker login -u $CI_DEPLOY_TOKEN_USERNAME -p $CI_DEPLOY_TOKEN_PASSWORD $CI_REGISTRY
    - apk --update add openssh-client
    - eval $(ssh-agent -s)
    - echo "$SSH_SECRET_KEY" | tr -d '\r' | ssh-add -
    - mkdir -p ~/.ssh
    - echo "$SSH_KNOWN_HOSTS" >> ~/.ssh/known_hosts
    - chmod 644 ~/.ssh/known_hosts
    - chmod 700 ~/.ssh

build_image:
    stage: build
    script:
        - docker build -t $CI_REGISTRY/$CI_PROJECT_PATH/image:latest .
        - docker push "$CI_REGISTRY/$CI_PROJECT_PATH/image:latest"

deploy_image:
    stage: deploy
    script:
        # Wrap this command in an if statement because if podman ps -aq is empty then podman stop will throw an error.
        - ssh -T $PROD_SERVER_USER_SSH 'if podman stop $(podman ps -aq); then podman rm $(podman ps -aq); fi'
        - ssh -T $PROD_SERVER_USER_SSH bash -c "podman login -u $CI_DEPLOY_TOKEN_USERNAME -p $CI_DEPLOY_TOKEN_PASSWORD $CI_REGISTRY && podman run -p 8080:80 $CI_REGISTRY/$CI_PROJECT_PATH/image:latest"

    environment:
        name: production
        url: <url>
The build_image step works fine, but I run into trouble during deployment. The first issue is that the gitlab environment variables don't seem to expand correctly from within the ssh commands. Via
code:
ssh -T $PROD_SERVER_USER_SSH bash -c "echo ..."
I was able to figure out that $CI_DEPLOY_TOKEN_USERNAME and $CI_PROJECT_PATH aren't getting replaced with the real values within `ssh -T ... bash -c "..."`. After replacing those two variables with hardcoded values, the error I get now is:

code:
$ ssh -T $PROD_SERVER_USER_SSH bash -c "podman login -u <username> -p $CI_DEPLOY_TOKEN_PASSWORD $CI_REGISTRY && podman run -p 8080:80 $CI_REGISTRY/<project_path>/image:latest"

Error: missing command 'podman COMMAND'
Try 'podman --help' for more information.
It sure doesn't look like I'm calling podman without a command here. I would believe it's a problem with my ssh command. I've tried enclosing the command in single quotes as well as using bash -c "".

Any thoughts on what might be causing my issues?

Boz0r
Sep 7, 2006
The Rocketship in action.
What is it called in Azure DevOps when committed code gets rejected if it doesn't build and pass all tests? People from our team break our pipelines all the time and I'm sick of it.

Mr Shiny Pants
Nov 12, 2012

Boz0r posted:

What is it called in Azure DevOps when committed code gets rejected if it doesn't build and pass all tests? People from our team break our pipelines all the time and I'm sick of it.

A good thing(tm).

Boz0r
Sep 7, 2006
The Rocketship in action.

Mr Shiny Pants posted:

A good thing(tm).

Thank you, but I can't find out how to set it up if I don't know what they call it.

vanity slug
Jul 20, 2010

Boz0r posted:

What is it called in Azure DevOps when committed code gets rejected if it doesn't build and pass all tests? People from our team break our pipelines all the time and I'm sick of it.

branch policies, i guess? https://docs.microsoft.com/en-us/azure/devops/repos/git/branch-policies?view=azure-devops

Asleep Style
Oct 20, 2010

Asleep Style posted:

Gitlab CI/CD deploy troubles

In the end it turns out my problem was the bash -c in this line:
code:
        - ssh -T $PROD_SERVER_USER_SSH bash -c "podman login -u $CI_DEPLOY_TOKEN_USERNAME -p $CI_DEPLOY_TOKEN_PASSWORD $CI_REGISTRY && podman run -p 8080:80 $CI_REGISTRY/$CI_PROJECT_PATH/image:latest"
I think that was somehow causing only part of the command to be executed. Changing that to

code:
        - ssh -T $PROD_SERVER_USER_SSH "podman login -u $CI_DEPLOY_TOKEN_USERNAME -p $CI_DEPLOY_TOKEN_PASSWORD $CI_REGISTRY && podman run -p 8080:80 $CI_REGISTRY/$CI_PROJECT_PATH/image:latest"
solved all my problems.

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

Boz0r posted:

What is it called in Azure DevOps when committed code gets rejected if it doesn't build and pass all tests? People from our team break our pipelines all the time and I'm sick of it.

You're describing something like a pre-push hook, which isn't really supported in Azure DevOps.

What you're after are branch policies. A branch policy puts required conditions on completing a PR. With a branch policy in place, people can't push directly to a protected branch; they can only get things into that branch via a PR. So the workflow would become create branch -> work in branch -> push branch -> open PR -> builds/etc run -> PR can be completed once all policies are fulfilled.

FISHMANPET
Mar 3, 2007

Sweet 'N Sour
Can't
Melt
Steel Beams
Are you using Azure DevOps for code storage as well or just pipelining and putting your somewhere else like GitHub? And to be more specific are build and tests failing on code that's supposed to be "production" ready? Because obviously you won't know for sure if it builds and passes tests until you've run it through a pipeline that will build it and and run tests, but you don't have to "commit to master" so to speak to get that.

Bulgogi Hoagie
Jun 1, 2012

We

Docjowles posted:

I agree trying to use terraform in a vendor neutral way is nuts at best and impossible at worst. But what they may have meant is that with a vendor specific tool, if you end up switching clouds (or jobs) your knowledge of the old tool is pretty much useless. Whereas once you know how to use terraform on Azure, you know how to use it on Amazon or GCP or a bunch of other things. You don’t have to relearn from zero even if you can’t directly reuse your code. And that has value.

For where/how to run terraform, is anyone loving with Atlantis? My teammates are all hot to try it out. Interested if anyone has hands on experience.

we’ve got Atlantis running on our development environment, it’s alright for keeping track of who did what where

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.
Anyone have strong opinions one way or another on Jenkins Job Builder vs. Job DSL?

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
Jenkins Job Builder is really good if you need to manage hundreds or thousands of jobs with similar structures.

DSL is better if you want to let teams manage their job structure.

We use both. JJB to manage permissions, job structure, etc. Certain jobs point to Jenkinsfiles or whatever for teams that want stuff other than our standard build/test stuff.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
We use JJB (and have also written a JJB builder tool) and teams manage Jenkinsfiles. Our JJB configuration manages several thousand matrix jobs and across dozens of teams and several independent clusters. The documentation for Job DSL was not as thorough or as easy to search for compared to JJB a while ago so user friendliness won.

Blinkz0rz
May 27, 2001

MY CONTEMPT FOR MY OWN EMPLOYEES IS ONLY MATCHED BY MY LOVE FOR TOM BRADY'S SWEATY MAGA BALLS
We use job DSL for the precise reason Gyshall mentioned: it let's our teams manage things.

Of course it really means that product teams complain to the platform team every time something goes wrong because why bother reading logs or trying to understand how systems work?

Boz0r
Sep 7, 2006
The Rocketship in action.
We use dotnet pack to build nuget packages. How do I add a branch name to the package name?

NihilCredo
Jun 6, 2011

iram omni possibili modo preme:
plus una illa te diffamabit, quam multæ virtutes commendabunt

Boz0r posted:

We use dotnet pack to build nuget packages. How do I add a branch name to the package name?

I would strongly recommend putting the branch name in the version tag instead (eg "1.2.3.4-bozorsexperiment"), or in some other annotation. It's a fairly conventional assumption that project name == package name == root namespace, and while I don't think you would actually break stuff by doing things differently, it's highly unintuitive.

The only reason I can think of to put the branch name in the package name is so that you can add two different branches as dependencies at the same time, which I dearly hope is not necessary, as it could easily lead to a mess of old unmaintained branches sticking around because other projects depend on them.

Boz0r
Sep 7, 2006
The Rocketship in action.

NihilCredo posted:

I would strongly recommend putting the branch name in the version tag instead (eg "1.2.3.4-bozorsexperiment"), or in some other annotation. It's a fairly conventional assumption that project name == package name == root namespace, and while I don't think you would actually break stuff by doing things differently, it's highly unintuitive.

The only reason I can think of to put the branch name in the package name is so that you can add two different branches as dependencies at the same time, which I dearly hope is not necessary, as it could easily lead to a mess of old unmaintained branches sticking around because other projects depend on them.

Yeah, that's what I want to do. I forgot that difference between name and version tag :)

EDIT: I've tried setting an environment variable, but I get an error:

code:
'$(Year:yyyy).$(Month).$(DayOfMonth)$(Rev:.r)-$(SourceBranchName)' is not a valid version string. (Parameter 'value') 

Boz0r fucked around with this message at 10:17 on Jun 8, 2020

New Yorp New Yorp
Jul 18, 2003

Only in Kenya.
Pillbug

Boz0r posted:

EDIT: I've tried setting an environment variable, but I get an error:

code:
'$(Year:yyyy).$(Month).$(DayOfMonth)$(Rev:.r)-$(SourceBranchName)' is not a valid version string. (Parameter 'value') 

How/where are you trying to do this? What process are you using? Where are those variables supposed to be coming from?

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
My new gig has a requirement that we have to run portions of the app on premise in a data center. I'd like to handle everything with APIs and/or Ansible if possible.

I've been out of the hardware game for a bit, but my thought was to do some amount of hyper converged infra across two racks.

What's the best stuff these days for handling PXE? Last I did anything was MaaS.

How about firewalls/switches? We want to avoid Cisco or any stuff like that. Kind of out of the loop on that as well.

necrobobsledder
Mar 21, 2005
Lay down your soul to the gods rock 'n roll
Nap Ghost
For PXE last I ever did anything that worked reliably was running legacy stacks in IaaS lift and shift configurations after MaaS didn't work out for us, so I'm kind of at a loss there.

If you're not going to be doing hardware switching, you should be prepared to lose a bit of latency (hardware is still superior to software outside of what hyperscalers can justify). Juniper has vMX that is one of the better software routers in the SDN landscape at least for enterprisey folks. And honestly, if you're into SDN management, you should be looking to work with NAPALM supporting vendors and use either Ansible or Saltstack for management.

freeasinbeer
Mar 26, 2015

by Fluffdaddy
State of the art is something tinkerbell.org or digital rebar, which use iPXE to drive advanced workflows.

Gyshall
Feb 24, 2009

Had a couple of drinks.
Saw a couple of things.
Yeah I started playing with Digital Rebar last night. Seems like that will handle the metal scaling pretty well. I've used MaaS before but I think we'll need something a bit more robust.

I'll take a look at the other stuff. Any good reference architecture out there for these kind of setups?

12 rats tied together
Sep 7, 2006

My employer uses cumulus linux devices as TOR switches and I'm definitely not qualified to offer you an objective evaluation of them but I understand that they are cheap, and I like working with them a lot because they are just linux boxes. PXE we're using foreman, which seems to kind of suck, but it seems like everything sucks down there in PXE land.

xzzy
Mar 5, 2009

Are projects like maas and tinkerbell useful in more traditional provisioning environments?

Where I'm at we're typically 5+ years behind the cutting edge, so we're not very cloud aware though the momentum is building. Now that everyone is work from home provisioning new hardware has gotten really difficult because our workflow is fairly antique.. get some new hardware, someone spends a day configuring IPMI and customizing a ks.cfg then letting pxeboot do its thing. This hasn't evolved much for almost 20 years even though we've iterated through stuff like cobbler and foreman.

So if there's some better provisioning tools out there that support traditional servers (that is, it's provisioned and performs a single task until the hardware fails), that is pretty interesting to me. There has been talk of moving user analysis jobs into openshift which I'm not sure I'm keen on but it does indicate a more dynamic solution is on the horizon.

fankey
Aug 31, 2001

Is this the right thread for what's likely a super basic docker question? I'm trying to run graphite and grafana on the same VM. If I start the graphite container before the grafana container everything works fine. If I start the grafana container first the graphite container doesn't operate correctly.

If I shell into the container everything looks fine - at least as far as I can tell - I'm not a graphite expert. The log files all look clean and I'm able to access the web based dashboard via wget and localhost. Externally I'm not able to talk to the graphite container at all over 80 and the grafana container is unable to connect to the database over 8080. Given that these both run through different processes ( port 80 -> nginx and port 8080 -> gunicorn ) it seems like something basic is busted.

The 2 containers don't share any ports - here's what it looks like when things are running properly
code:
CONTAINER ID        IMAGE                         COMMAND             CREATED             STATUS              PORTS
                                                                                                                     NAMES
662e33fd84ae        grafana/grafana               "/run.sh"           3 months ago        Up 14 minutes       0.0.0.0:3000->3000/tcp my-grafana
bdd6839741ff        graphiteapp/graphite-statsd   "/entrypoint"       4 months ago        Up 15 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:2003-2004->2003-2004/tcp, 0.0.0.0:2023-2024->2023-2024/tcp, 2013-2014/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8126->8126/tcp, 8125/tcp, 0.0.0.0:8125->8125/udp   graphite
Any idea what's going on here? Where would I start to look to debug this?

Hadlock
Nov 9, 2004

^ use Prometheus, unless you absolutely have to use graphite. It works out of the box and it's the new standard for that sort of thing

So my question,

Our analytics team is working from a backup/restore of the proof database that happens nightly. Now they want near-real-time data. This analytics db is walled off from regular users and approved by it security etc. It's a postgres 9.6 db that's going to 11.x soonish

I was thinking about still doing the dump/restore on say, Sunday nights, and then restore from WAL files using Wal-E to manage write-ahead logs and setup the analytics db as a delayed replica

The advantage of this is I'm not consuming a replication slot, I'm not consuming database connections etc as I'm keeping the DB up to date via WAL files stored on S3

We used Wal-E at a previous company to do something similar, I think they're still using it to this day

I'm curious if there's a better way to do this. A quick Google shows that Heroku and Gitlab are doing exactly this so I don't think I'm breaking any new ground

Vulture Culture
Jul 14, 2003

I was never enjoying it. I only eat it for the nutrients.

fankey posted:

Is this the right thread for what's likely a super basic docker question? I'm trying to run graphite and grafana on the same VM. If I start the graphite container before the grafana container everything works fine. If I start the grafana container first the graphite container doesn't operate correctly.

If I shell into the container everything looks fine - at least as far as I can tell - I'm not a graphite expert. The log files all look clean and I'm able to access the web based dashboard via wget and localhost. Externally I'm not able to talk to the graphite container at all over 80 and the grafana container is unable to connect to the database over 8080. Given that these both run through different processes ( port 80 -> nginx and port 8080 -> gunicorn ) it seems like something basic is busted.

The 2 containers don't share any ports - here's what it looks like when things are running properly
code:
CONTAINER ID        IMAGE                         COMMAND             CREATED             STATUS              PORTS
                                                                                                                     NAMES
662e33fd84ae        grafana/grafana               "/run.sh"           3 months ago        Up 14 minutes       0.0.0.0:3000->3000/tcp my-grafana
bdd6839741ff        graphiteapp/graphite-statsd   "/entrypoint"       4 months ago        Up 15 minutes       0.0.0.0:80->80/tcp, 0.0.0.0:2003-2004->2003-2004/tcp, 0.0.0.0:2023-2024->2023-2024/tcp, 2013-2014/tcp, 0.0.0.0:8080->8080/tcp, 0.0.0.0:8126->8126/tcp, 8125/tcp, 0.0.0.0:8125->8125/udp   graphite
Any idea what's going on here? Where would I start to look to debug this?
Start the containers in the "wrong" order, docker inspect both and post the gist here

Adbot
ADBOT LOVES YOU

Pile Of Garbage
May 28, 2007



If you need a quick-fix use docker-compose and include a depends_on statement to control start-up order. Still would be better to work-out what the issue is ofc

  • 1
  • 2
  • 3
  • 4
  • 5
  • Post
  • Reply