Are any teams using trunk-based development (ie, s...
# general
c
Are any teams using trunk-based development (ie, short-lived feature branches) with Pulumi and CD? I am moving this org's Pulumi to CI/CD (ie, Github Actions/Gitlab Pipelines), but I only see guidance for setting up GitFlow-based projects with one stack-per-branch. The org's existing Pulumi projects are using trunk-based development as "Gitflow" projects are harder to manage at this scale, and developers don't want to adopt that model... and I don't blame them. My main puzzle is how to provision multiple environments (stacks) from a single main branch using a CI pipeline. Any advice or tips is welcome! References to trunk based development flows on Pulumi.com • https://www.pulumi.com/what-is/what-is-ci-cd/ • https://www.pulumi.com/blog/platform-engineering-pillars-2/
a
We use one project, multiple stacks approach.
in repo/pulumi folder
Copy code
{pulumi-source-files}
Pulumi.yaml
Pulumi.production.yaml
Pulumi.staging.yaml
...
{source-code-project (npm/poetry/mvn/etc)}
each stack (environment) has its own aws account, or vpc, to isolate environment resources.
c
Thanks @adamant-lawyer-19698 - do you also provision Pulumi using CI/CD?
a
yes, we use GH actions to run
pulumi up
c
How do you determine which environment/stack to provision when performing a merge to the main branch?
a
we have one workflow for each env/stack
and set one stack as auto deployment. for prod, we always deploy manually
šŸ‘ 1
m
I agree with L W, but I'm also curious what are the multiple environments running off of main @curved-jordan-5346? Staging environments or something else? My mental model of trunk based is that once something is in main, its going to prod. Maybe that 'going' step has some wiggle room but you don't want to have things in main that aren't ready for production usage.
s
@modern-spring-15520 it may be going to prod, but quite likely not straight to prod. Things may - probably should - go through at least one environment where there is automated testing that acts as a gate to promotion (of that code, to a higher environment). The ideal is that if the gates pass you end up in production, but it may be that there is manual gate or something before code goes live. Where I've done this I've automated the promotion step (whether the trigger is automatic or manual) - so in the pipeline the stack is specified as part of the job. There is a reasonable challenge here in that there may be a need for multiple distinct test environments - which would be a whole conversation with appropriate beverages šŸ¤” but at a most basic level you are just running a stack per "level".
m
That makes sense @some-flower-64874 and I like the idea of a auto-promotion workflow. How many levels? My purely theoretical thinking is that I should try to do as much as possible checking before main, so I don't have commits sitting around there that actually fail some gate, but also the real world being what it is, that can be hard.
Asking out of curiosity btw. A million ways to get things done.
s
This may get a bit long... As I see it (and I make no claim to being an expert) having things break is not a problem, leaving them broken is the problem - the whole point of the tests is to catch things šŸ™‚ What that means in practical terms is that you can test up to a point in isolation but at some point you need to run more extended tests in a "real" environment (testing the system as a whole rather than some component and with things that are definitely not emulated). The other thing is that even committing to trunk you need to isolate changes, so feature flags, so you want to be able to test at varying levels of confidence - so that tends to suggest dev / test / staging / production - so I think minimum of 2 environments/stages, probably not more than 4 as a rule (there may be a parallel UAT type environment next to staging, to meet diverse needs...) The fun bit (with flags) is that all you actually care about is that the code doesn't break with the production flags - so your automated promotion should be on that basis, even if the tests fail for the WIP flags at a given level (this is important to avoid getting into the nightmare that is hotfixes...) but I haven't seen this in the real world, just aspired to making it true. And for all the above I currently sit where far too much stuff is far too manual and to describe our code as insufficiently tested is making circumstances sound vastly better than they really are...
āœ… 1
m
to describe our code as insufficiently tested is making circumstances sound vastly better than they really are
šŸ™‚
the whole point of the tests is to catch things
Agree with that! thanks for sharing
šŸ™‚ 1
w
I've seen this done a bit with git tags - i.e. the commit gets tagged with dev-<something> to trigger a CI dev deploy, then qa-<something> to deploy there, etc
šŸ¤” 1
šŸ‘šŸ» 1
Haven't played with it a tonne myself yet but been meaning to poke at the idea more
s
My current solution (where I have actual automated deployment) is to use github releases as a trigger - partly because that gives me a sane workflow, and partly to limit the number of containers we build without having to worry about when its reasonable to push. Its not my favourite solution, but it is a pragmatic one and it definitely works. Plus redeploying (including reverting) can be done by manually running the action with the appropriate tag - without the need to do the build part. Because of where I am, and the nature of the projects, this is a single environment, the next step is to go to staging for a release and then have a gate (human approval) for the same code to go to the production environment. Inherent in this (for me) is that deployed artefacts (containers in my case) are built exactly once - and then deployed to each environment in turn. Easy for .NET, rather more fun it seems for javascripty things (something I haven't worked out for the inherited code I'm working with).
c
@modern-spring-15520 We have 8 stacks/environments for the project: • "Demo" environment for infra/platform tests/POCs • DEV environment • 3 UAT environments based on DEV for customers • STG environment • 1 "stress testing" environment based on STG • PRD environment Promoting code through different stacks will become unwieldy if we switched to a Gitflow model with "develop" branches.
@witty-battery-42692 Thanks for sharing. Using empty commits for gitops commands is something I briefly considered... but I will now POC this method
I also like this because we can pin the intended state of an environment to the last gitops commit.
w
My org is doing this and our main solution has been tightly coupling to Pulumi ESC. We're using GitLab, which strongly encourages feature branches/centralized repos over forks, and so that's what we do. Fork-based-workflows may have to work out a few extra details. Our workflow at it's purest is: 1. Make a feature branch with feature/my-name/jira-123, a jira ticket being present is minimally enforced with branch policies and a basic regex. 2. The branch existence is enough to trigger a pipeline. The pipeline extracts the jira ticket and creates a stack name [jira-123]. 3. A Pulumi.lab.yaml config is already part of the repo, the pipeline copies it to Pulumi.jira-123.yaml (using pulumi config cp, instead of any os-based copy command to persist any secrets that might be encrypted in it). Then a custom script looks for {{stack_name}} and replaces it with jira-123. That's the extent of app-specific configs solved/supported. 4. For this to be most useful, Pulumi.lab.yaml imports team-name/lab from Pulumi ESC which has all the standard upstream info, such as more long running/stack stuff like a VPC, shared cluster, etc. Engineers can add whatever other envs they want to this, with the confidence they'll move to their ephemeral stack that gets spawned on feature branches. 5. Every commit to the feature branch runs a new build/test/pulumi up, and if appropriate exports things like what url things can be tested on (e.g. https://myapp-jira-123.lab.myorg.notreal) 6. When they open a merge/pull request that is the CI's queue to run even more tests. Things like actually calling to jira to validate the ticket exists, is in a valid status, and has some QA tests checked in against it are run. The PR will be blocked until those tests pass, but the engineer is free to check 'auto-merge when checks pass', and that, combined with some webhooks from various integrations to retest on any changes, lets it be mostly a set-it-and-forget-it workflow. 7. The PR checks include a pulumi preview against as many higher envs as they have specified in their CI template. We have GitLab components where they can just say how many higher envs they actually want, because our org has various pockets that grew up independently, and now some teams have a simple dev/test where others have as many as 9... we did at least get the whole org to agree what envs come first in order of importance (e.g. lab, dev, stg, prd). So, the pipeline only checks if the env is enabled, and doesn't have to do dynamic ordering. 8. Once the various checks pass that means jira workflows and regression tests have checked all the boxes that all existing tests, and any new ones have been written and passed. The CI now runs pulumi up on each environment in order of importance until finally arriving to prod. The teams can inject things like load tests in this process, but are strongly encouraged to find a way to fit that into the MR/PR checks if they can. Things failing on their way to prod is an exception, not a rule. I want to echo what @some-flower-64874 said earlier, which is that this workflow wouldn't work as well if we didn't have feature flags. This workflow generally applies to microservices, and they're deployed with a feature turned off. The feature orchestration is done later, and environment wide as a different pipeline with specific regression and load tests run in lower envs then promoted up.
šŸŽ‰ 1
šŸ‘ 1
c
Thanks for the suggestions - I eventually settled on creating CI pipelines coded to select a Pulumi stack rather than with GitOps commands. Long story short: GitOps commands work, but Bitbucket pipelines do not provide real support for securely implementing these empty-commit commands on self-hosted runners without paying for other Atlassian products such as Forge or Jira Service Manager. In any case, it turned out to be a simple problem to solve and the existing development workflow doesn't need to be modified. šŸ™
a
Simple is the king.