|
Multiple aws accounts is terrible. We only have 3 for the main engineering/saas part of the org. Maybe this is an awkward number where it's so few its not worth getting into Organizations but enough that it's still irritating. But this is how we separate dev from prod anyway. It's terrible when we need to reference dev stuff from prod for a single pane of glass monitoring or ci/cd reason and intentionally break that isolation. Maybe this is another consequence of it being a rare pattern for us when we do need to deal with inter-account stuff.
|
# ? Jan 15, 2021 18:23 |
|
|
# ? May 15, 2024 02:48 |
|
freeasinbeer posted:So I’m fine with breaking stuff up into tiers. Or segmentation Agreed, the "microaccounts" paradigm is dumb as poo poo. The only reason to break up accounts IMHO is for doling out admin access to IAM, or heavily interactive services (Quicksight etc) it's a common enough use case that you should be able to do it on a whim, but you shouldn't strive to do it once for every service. The thing I like least about working in an org with a large AWS presence is having to deal with a bunch of extremely bored software devs/SREs who apparently don't have anything better to do than sit around and think about how to make cloud deployments more complicated. I don't know how many times I've had to make arguments against re-inventing the ec2 interface as a .net webapp, for example.
|
# ? Jan 15, 2021 18:31 |
|
And then the best bit is that all that segmentation lives on the same shared TGW(with route tables per tier), so it’s like a giant VLAN for everything. The other thing I see them do is to make a bunch of inner source to do something simple, because lambda.
|
# ? Jan 15, 2021 20:20 |
|
freeasinbeer posted:And then the best bit is that all that segmentation lives on the same shared TGW(with route tables per tier), so it’s like a giant VLAN for everything. That's a standard design pattern: deploying a central TGW that has your Direct Connect and/or VPN attachments, then you attach all your VPCs to it and have a single egress VPC. It reduces costs as you only have to maintain the one NAT gateway for the accounts and you don't need additional VIFs. That said it's nothing like "a giant VLAN for everything." For starters TGWs are Layer 3 with the expectation that you configure one or more routing domains.
|
# ? Jan 15, 2021 21:20 |
|
Pile Of Garbage posted:That's a standard design pattern: deploying a central TGW that has your Direct Connect and/or VPN attachments, then you attach all your VPCs to it and have a single egress VPC. It reduces costs as you only have to maintain the one NAT gateway for the accounts and you don't need additional VIFs. So I’m fine with the design. But it’s seemingly at odds with the idea of microaccounts for IAM access. All of the pain of setting up IAM trusts and limited roles across mutiple accounts, to say yolo to the network tier.
|
# ? Jan 15, 2021 21:57 |
|
12 rats tied together posted:Agreed, the "microaccounts" paradigm is dumb as poo poo. Going in the opposite direction, you end up with a single ansible/cloud build role that deploys all your prod and nonprod stuff, and a developer accidentally checks in a dev AMI to prod and causes a cascading failure in prod at 10am on a Monday Over time that ansible God role is the obvious choice to Get poo poo Done because lazy developers know it always works, and it becomes the defacto developer cred causing all sorts of security holes Let me think of other things that I have definitely never seen before in a work environment,
|
# ? Jan 15, 2021 23:55 |
|
please don't git blame me
|
# ? Jan 16, 2021 00:08 |
|
Methanar posted:Multiple aws accounts is terrible. It’s actually good. We have 16 accounts: the usual Dev/test/prod, plus separate accounts for our IT team, logging, ci/cd, and an account that’s just for networking between accounts/sites.
|
# ? Jan 16, 2021 15:25 |
|
Wait, are Workspaces in Terraform CLI and Terraform Cloud almost completely different features with the same name? What the hell is this.
|
# ? Jan 16, 2021 15:58 |
|
If Service Control Policies are a useful way for you to set an outer boundary on user permission sets, use multiple accounts. If enforcing sensible tagging on all your resources is a shitshow and you need another mechanism to bill by, use multiple accounts. If not, then don't. If you have an infrastructure that's so huge that mere mortals on your app teams can't reason around the outputs of the AWS console, consider a segmented account. Accounts and Organizations aren't complicated, especially if you're already assuming roles to interact with AWS resources. For us, we grew pretty organically in a regulated environment without a strong sense of cloud governance, so our infrastructure ended up being a grab bag of components that our security and risk management functions had limited ability to systemically reason around. As a result, we got into a situation where every production change to a component required a security review. Having accounts at system boundaries gives people the ability to freely refactor the infrastructure of their production stuff, because your security function only needs to apply that level of scrutiny to new connections people are trying to create between systems. There's plenty of ways to do that without multiple accounts, especially if you have a strong culture of using internal tooling and modules to manage your cloud presence, but having multiple accounts makes it easy and helps AWS to be your platform instead of the thing you build your own platform on top of. Spring Heeled Jack posted:Wait, are Workspaces in Terraform CLI and Terraform Cloud almost completely different features with the same name? What the hell is this.
|
# ? Jan 16, 2021 16:47 |
|
AWS not having namespaces and ACLs around them fitting the IAM conventions around ARN (also, who the heck actually uses paths in their IAM policies again? I've managed 900+ accounts and we just gave up due to so much confusion in the feature) is the big design misstep that means we have to suffer with multiple account madness everywhere and trying to deal with the really weird error messages that can show up and fail (or not fail) silently can be absolutely grating. We have our KMS CMKs in a separate account and properly separated via segregation of responsibilities and all that, but what they don't tell you is that when you don't have read permissions that are transitive across accounts not having access to the KMS keys inter-accounts gives completely different error messages compared to permissions errors within an account. You'll get S3 errors like "object not found", for example, not mentioning KMS anywhere in the service call trace while within an account you'll get an error about access denied first. This error eliding isn't that bad from a defensive in depth security standpoint where you don't want to reveal too much information in errors, but wow is it awful during development. I wish there was a way to provide error logging levels to the backend to aid during development and locking down in prod instead of sitting within a service client's logs hoping and praying it's visible there.
|
# ? Jan 16, 2021 17:34 |
|
I'm a little lost and could use some help. I own huhu.com and its DNS records are managed by Digital Ocean where I also do my hosting. I also have hosting with Google Cloud Platform and wanted to spin up a little temporary project on a subdomain at tempproject.huhu.com. I have the tempproject running with Google Cloud Run. How do I point tempproject.huhu.com to tempproject?
|
# ? Jan 16, 2021 20:20 |
|
Most likely, add an A record with the public IP of your server
|
# ? Jan 16, 2021 20:36 |
|
https://cloud.google.com/run/docs/mapping-custom-domains
|
# ? Jan 17, 2021 00:51 |
|
I would like some guidance about the following: Say I want to have my Ansible stuff in a Git repo and have the plays be run by a CI pipeline when I update a file. What would be a good way to structure the repo? And how do I get it to not run everything in the repo? This seems like a pretty basic thing but I haven't found a satisfactory answer for this. An example would be running a play to update some users on a specific server, how do I build it to just run the play I updated without running everything else again that may also be in the repo? Do I just change the inventory? I have a hard time getting my head around this.
|
# ? Jan 17, 2021 10:08 |
|
The inventory supports variable precedence for hosts/groups, so in theory if the operations being run are idempotent it shouldn't matter if everything gets re-run. You'd just trigger the rerun based on the change to the inventory file(s). E: The dragons here being that not all modules are designed to perform idempotent operations. Probably also true if the playbook is calling a custom script/direct call to shell etc.
|
# ? Jan 17, 2021 16:23 |
|
Check your CI documentation around pipeline triggers. In Azure Pipelines you can use path triggers to run a pipeline when a specific file or directory is updated. Really though idempotent is the way to go.
|
# ? Jan 17, 2021 18:08 |
|
A good way to structure the repo:code:
In general I don't think auto-running playbooks on merge-to-main is a good idea except for extremely simple use cases where you only need to make a basic, non-interrupting config change on servers without any type of orchestration, like your user update example. For that use case I'd probably set up ansible-pull, which will have your nodes periodically check in with your ansible repo and pull and run it.
|
# ? Jan 17, 2021 20:41 |
|
Thanks, that is exactly what I was getting at, I don't need it to try and create the already existing users or some other configuration item. I don't think, for an example, the LDAP modules/scripts/etc. are smart enough to not create AD users again, even though it will probably error out because the user already exists. Or is that they way it should work? It doesn't "smell" right to be honest. I'll have a look at the Drone documentation if I can filter out the changed file.... Edit: My thinking is that I want one Repo per "tenant" as it were, which hosts all the configuration stuff relating to that tenant (infrastructure, network etc.) If I need to update something or add something I want the whole repo on my machine, change or add the necessary things, check it in, review it, merge it, and have the CI do it's thing without re-running the playbooks that haven't changed in the meantime. Is this a good approach? Or should I divvy up the repo's by kind eg: DC's, Routers, webservers? My thinking on how I should approach this is not completely clear. Mr Shiny Pants fucked around with this message at 20:52 on Jan 17, 2021 |
# ? Jan 17, 2021 20:42 |
|
freeasinbeer posted:So I’m fine with the design. But it’s seemingly at odds with the idea of microaccounts for IAM access. All of the pain of setting up IAM trusts and limited roles across mutiple accounts, to say yolo to the network tier. there are alternatives like leveraging privatelink within an organization to connect services across many tiny VPCs
|
# ? Jan 18, 2021 19:41 |
|
Mr Shiny Pants posted:Thanks, that is exactly what I was getting at, I don't need it to try and create the already existing users or some other configuration item. Good ansible modules will expose this to you as an input parameter, for example netbox_device allows you to specify a query_param field, so in a situation where you need to rename every netbox_device you can use the device's serial number as a uniqueness control instead of the default device name. In my experience the above gotcha is the only issue that you'll run into with an idempotence based execution model, and so long as you understand the constraints of whatever thing ansible is talking to and make good use of check_mode, you're totally fine to re-run the config on all the users constantly. Mr Shiny Pants posted:Edit: My thinking is that I want one Repo per "tenant" as it were, which hosts all the configuration stuff relating to that tenant (infrastructure, network etc.) What you're describing is totally possible, but it's a solution where you're blunting the effectiveness of the more robust tool so it can be more easily executed by a primitive ci/cd system, which feels kinda bad to me. As tortilla_chip described, ansible has tons of features for letting you manage multiple "tenants" in the same repository, it is also designed explicitly to support this and I've worked with ansible repositories that manage many thousands of hosts across many dozens of distinct tenants. What you describe will totally work but the cost of the tradeoff will be that it will be harder to "share" common logic and inventory data between tenants, since ansible is not so good at pushing/pulling across the repository boundary (galaxy exists for this, but its pretty bad). Whether this is worth it or not really depends on a bunch of factors that only you are really qualified to evaluate If you go multi-repository, definitely come up with a documented repository schema ASAP and enforce that on all repositories as best you can, it is trivial to split or merge later but only if all repositories have a common format.
|
# ? Jan 19, 2021 17:50 |
|
Hadlock posted:[...] and a developer accidentally checks in a dev AMI to prod [...] Of course there are a lot of things going on with this example but it highlights a big part of the reason why I don't like ami-based workflows: since the pull request/merge process is the interface to production, it should contain every piece of information relevant to that production change. An "actionable diff" that looks like this is actively harmful: code:
code:
|
# ? Jan 19, 2021 18:11 |
|
^ yeah this is roughly equivalent to using an anisble secrets.yaml.vaulted vs a sops secrets.yaml. one fully obfuscates the changes within, whereas the other gives you a line by line, git-diffable file To be fair though, an AMI hash, so long as the AMI build script is checked in, is roughly equivalent to a git or container tag and should be a pinnable object That said I hate working with custom AMIs
|
# ? Jan 19, 2021 21:43 |
|
Yeah, the last time I was working a lot with ansible "vaulted files" it was policy that your reviewer had to acknowledge that they ran an 'ansible-vault view' on your changes and they OKed the results. It also meant that you had to find a reviewer who had an entitlement that granted them access to the appropriate vault key for that file, which could be really obnoxious depending on the vault being touched. Fortunately these days ansible supports inline vaulting which I started using and have never looked back.
|
# ? Jan 19, 2021 21:50 |
|
I'm not a fan of Ansible vault for storing secrets. Unless something has changed in the recent years, it's just another layer that will be abused by developers. There are Ansible plugins/modules that allow you to define variables based on Hashicorp Vault/GKE/etc. which allow for secrets to be defined at run time without the deployment user spitting secured data into bash history and the likes. Like, don't do "--build-args DEPLOY_KEY=..." do "--build-args DEPLOY_KEY=$(cat a_file_with_key)".
|
# ? Jan 20, 2021 03:19 |
|
Yeah, they're called lookup plugins and ansible supports variables that are themselves other variables, so you can totally be like:code:
I'm not sure what you mean about bash history and vault, it's totally possible to design an ansible workflow that requires you to specify --extra-vars="password=hunter2", but that's not the normal use case. The only thing that gets persisted into bash history by default is, on your control machine, ansible-playbook with a playbook, inventory, and a path to a file that contains the vault key. On the remote machines ansible_user will have a bash history line that shows it executing a payload, but the payload command itself doesn't have any secrets in it, and the unpacked payload cleans itself up as part of normal execution.
|
# ? Jan 20, 2021 17:49 |
|
IMO I heavily, heavily recommend and prefer that that secrets (and environment specific properties when possible) get pulled in as environment variables rather than a command line -e other_vars flag or an ansible plugin. It's more secure, extensible, and portable. It's straightforward and works with pretty much everything. It allows you to swap your hashicorp secrets solution for an LDAP one, it allows you to stick stuff in a docker image or kubernetes and "just work(tm)", it allows you to easily iterate and test locally without relying on outside servers and it allows you to override those secrets locally on a given run when needed. It also lets you hook into not-ansible for things like automated rspec/junit tests using the same method. It's just a better, more flexible approach than locking yourself into only ansible with a community supported plugin. There are definitely use cases for ansible plugins in general though, it's just not my preferred method in this use case. HOWEVER, all that said, putting secrets in -e flags is wildly insecure so for a variety of reasons so Do Not Do That regardless of what solution you finally decide on. Bhodi fucked around with this message at 19:10 on Jan 20, 2021 |
# ? Jan 20, 2021 18:54 |
|
I think I agree, except secrets shouldn't be plaintext in the environment, and I just use ansible to manage all of the environments since secrets/data need to exist in a centralized, reviewable location somewhere and that somewhere might as well contain hooks for reading from and writing to every possible supported place I might need to put them. Any tool that has orchestration hooks and has support for storing encrypted secrets in a VCS is usable here, ansible is just the best equipped tool for this in my experience. You could probably do the same thing with chef encrypted databags + chef automate + chef habitat or something? But you could also just pip install ansible and you're done. The main thing I advocate for that isn't "just put it in ansible" is something like AWS Secrets Manager, or similar cloud service, and only for situations where you need secrets in pull-mode (autoscaling, etc.). If you bake a secret-that-can-grab-a-secret into your thing that does the pulling, you might as well have just baked the secret in directly, from a security perspective its the same level of exposure and risk and managing exposure and risk is really the only halfway difficult thing about this field. I don't really get the hesitance to avoid ansible for fear of ansible "lock-in", but I'm at a level of familiarity with the tool where ansible is just bash 2: it's a glue tool, and I can stop using it any time by just using or scripting the things it orchestrates directly, but I have no reason to ever do that because it is so much of a productivity boost and there isn't really any better tool available. Can I really justify reinventing the serial strategy at work in powershell to tie a bunch of adhoc scripts that don't subclass ansible.module_utils.basic? Or the group_vars plugin to manage environment data inheritance and supersedence? No, that's a waste of time that I could spend focusing on business logic implementation.
|
# ? Jan 20, 2021 19:14 |
|
Using ansible as your secrets entrypoint definitely works as long as you buy into ansible as wrapper for anything you'd conceivably need to access those secrets. That hasn't been a good fit for me in the past, for example if you need some secrets to access or modify things which ansible isn't a good fit for - network hardware, AWS services, basically anything that isn't at the OS or application level. Just as an example, if your CI/CD testing wants to stand up an entire stack including VPC and EBS teardown, well you're probably going to be running terraform or cloudformation. If you go that route you've also got to manage accessing the same secrets in multiple different ways or wrap the whole thing in ansible - you may find that to be one Matryoshka too deep. It's better to have some sort of smaller wrapper to manage secrets outside ansible, something that's straightforward, relatively secure and broadly supported by literally every CI/CD tool - environment variables. Yes, it's definitely an extra step and probably not as clean. For something much more contained such as a single repository application without any external dependencies and with a straightforward compile/test/deploy, ansible works great. It starts to work less great when ansible is only a small tool in your overall CI/CD box rather than your entrypoint and your task is to try and keep them all in sync within the same pipeline / process. I completely agree that If you're baking in the secrets you probably don't need environment variables; you're replacing that with cloud-init metadata or some file on the system that's external to ansible in a similar way. From the ansible side you plug both in very similarly. Maybe a fact would be the right approach instead? I'd need to think on it and dig into details. Bhodi fucked around with this message at 19:51 on Jan 20, 2021 |
# ? Jan 20, 2021 19:44 |
|
Bhodi posted:It starts to work less great when ansible is only a small tool in your overall CI/CD box rather than your entrypoint and your task is to try and keep them all in sync within the same pipeline / process. Bhodi posted:I completely agree that If you're baking in the secrets you probably don't need environment variables; you're replacing that with cloud-init metadata or some file on the system that's external to ansible in a similar way. From the ansible side you plug both in very similarly. Maybe a fact would be the right approach instead? I'd need to think on it and dig into details. In general I do think biasing towards metadata stored in an external system but verified with some sort of trust is a best practice.
|
# ? Jan 20, 2021 20:06 |
|
Thanks for the replies 12 rats, I have read them and you gave something to think about.
|
# ? Jan 21, 2021 06:49 |
Getting further along in my migration away from Chef, with the goal of how to manage the apps that are running on a dedicated server I lease for hosting personal projects (bandwidth & storage costs are too prohibitive to run in AWS). I was hoping that my new solution would allow me to develop on my Windows machine without having to use VirtualBox VMs. I have existing VirtualBox VMs I'll need to work with in the meantime though and unfortunately it seems WSL2 & VirtualBox don't play nicely together: https://github.com/MicrosoftDocs/WSL/issues/798 Spent a bunch of time trying workarounds, only to eventually disable WSL2 and switch back to a VirtualBox VM. So now I've got a Vagrantfile with a simple bash provisioning script to create a Debian VM with a full GUI desktop environment that installs development tools (intellij, sublime text, docker, terraform, ansible, git, etc) and checks out all my git repos that I'm working with. Using this dev VM I can then run my docker compose files locally to test out changes in my individual projects. Also from that "dev VM" I can run run terraform to provision an EC2 instance which mimics my dedicated server, so that I have somewhere to test out changes before deploying them to said dedicated server. Then once that EC2 instance is up I have an ansible playbook to run against the EC2 instance which installs security updates, configures SSH, UFW, and installs docker. Once the EC2 instance is provisioned I can then use docker context to docker-compose up on the remote EC2 instance. Verify everything is working, then tear down the EC2 instance when I'm not actively testing things to save $. I haven't actually migrated over my dedicated server yet, still just testing things on the EC2 instance before that. Thinking more about what I want the deployment process to look like now. I am thinking a commit to my ansible playbook repo would trigger the ansible-playbook run, but also do a timer based build so that the ansible playbook runs at least once per day even when no commit occurs so that at least new security packages will be installed regularly. This would also do a --force-recreate with docker compose to update the containers with the latest image. The docker images would also be built upon commit and fixed daily schedule (to get latest security updates on the containers as well), but by a separate build job since they live in a different repo. Does this all sound reasonable? Any suggestions for changes?
|
|
# ? Jan 22, 2021 00:22 |
|
Does AWS sell anything that could be used as a pagerduty replacement for a govcloud IL5 environment. I had a meme idea of using Chime or Connect, but those aren't available in IL5. Basically, how do you do any sort of alerting, via phone or otherwise, if you are forbidden from exchanging data with non IL5 third parties. Twilio and friends are all out. I'm not even sure if you'd be able to run your own asterisk pbx.
|
# ? Jan 22, 2021 00:35 |
|
fletcher posted:Spent a bunch of time trying workarounds, only to eventually disable WSL2 and switch back to a VirtualBox VM. WSL1 is still perfectly supported and a valid option if that works for you. wsl2 uses hyperV which is where the conflict comes from whereas wsl1 is basically syscall emulation in the windows kernel and not secretly a VM.
|
# ? Jan 22, 2021 00:41 |
|
Methanar posted:Does AWS sell anything that could be used as a pagerduty replacement for a govcloud IL5 environment. I had a meme idea of using Chime or Connect, but those aren't available in IL5. I did a proof of concept for this at my last job before they just decided to spend money on PagerDuty and it basically amounted to having an SNS topic in a central ops account with the members of the team on the subscription list (+ some slack webhooks). CloudWatch alarms trigger a lambda that turns the alarm into something human readable at 3AM and depending on severity and source would use subscription filters to decide what channel and who to notify.
|
# ? Jan 22, 2021 01:08 |
|
whats for dinner posted:I did a proof of concept for this at my last job before they just decided to spend money on PagerDuty and it basically amounted to having an SNS topic in a central ops account with the members of the team on the subscription list (+ some slack webhooks). CloudWatch alarms trigger a lambda that turns the alarm into something human readable at 3AM and depending on severity and source would use subscription filters to decide what channel and who to notify. What do you mean by just throwing money at pagerduty. It's my understanding that having webhooks out to pagerduty at all is forbidden. Similar with the slack hooks you referenced on an SNS topic.
|
# ? Jan 22, 2021 02:09 |
|
Not sure what the right thread is. I have an idea for a simple web application I'd like to make. Basically an interactive command line like a helper script you might write in Python or Perl, but accessible from the web. This would be running for a maximum of a couple hours a week. If this were any simpler I could do it in Google Forms. Google Cloud looks good, but I don't know where to go after I sign up. Is there a name for what I want to do? Every search is full of scripting languages used for backend tasks or administration. I'm comfortable with the script-like code side of things, but all my coding experience is in languages like C, Java, and the P's above. What do I need to do to keep it secure and make sure the cost stays low? Stretch goals include sending email (like one per week) and Google user authentication (to detect name and whitelist about three users). edit: Got it started with a site hosted on firebase running jquery terminal Captain Cool fucked around with this message at 07:00 on Jan 26, 2021 |
# ? Jan 22, 2021 02:31 |
|
Methanar posted:What do you mean by just throwing money at pagerduty. It's my understanding that having webhooks out to pagerduty at all is forbidden. Similar with the slack hooks you referenced on an SNS topic. Sorry, the original POC was done with the instructions of minimising cost and we were all in on AWS. I'm not familiar with IL5 but was sharing how we put together most of it using just AWS services to get alerts to people's phones, pagers or emails. It's nowhere near as fully featured as something like PagerDuty, but it's able to do the job
|
# ? Jan 22, 2021 02:40 |
|
https://docs.aws.amazon.com/sns/latest/dg/sns-supported-regions-countries.html#w892aac29b7c25b5:~:text=sns.us%2Dgov%2Dwest%2D1.amazonaws.com ?
|
# ? Jan 22, 2021 02:44 |
|
|
# ? May 15, 2024 02:48 |
Methanar posted:WSL1 is still perfectly supported and a valid option if that works for you. wsl2 uses hyperV which is where the conflict comes from whereas wsl1 is basically syscall emulation in the windows kernel and not secretly a VM. Hmm interesting, thanks for the suggestion! Are there any downsides to using WSL1 ?
|
|
# ? Jan 22, 2021 03:31 |