Hi! My team makes a self-service application that ...
# automation-api
k
Hi! My team makes a self-service application that uses the automation API to provision cloud resources to other development teams. In short, we can run several
pulumi up
commands at the same time. The problem with this is that if one Pulumi program fails for whatever reason, they all fail. It seems that whatever socket or process underneath it all sends a kill signal to all running pulumi programs instead of just the one that failed. Is the automation API meant to be used in this way? Is it built to handle concurrent programs?
m
What language are you working in? yes, concurrent pulumi ups should be possible.
Here is a chatgpt generated example ( so caveats apply ) of 3 stacks in three threads in python:
Copy code
# pip install pulumi
# Runs: python thread_demo.py

import os
import tempfile
from concurrent.futures import ThreadPoolExecutor, as_completed
from pulumi import automation as auto

PROJECT_NAME = "py-thread-demo"

def make_program(run_id: str):
    # Capture run_id in a closure (avoid env/global mutation per thread)
    def pulumi_program():
        import pulumi
        pulumi.export("greeting", f"hello from {run_id}")
    return pulumi_program

def run_stack(run_id: str, fail: bool = False):
    # Demo-only: local backend with no passphrase prompt
    os.environ.setdefault("PULUMI_CONFIG_PASSPHRASE", "")

    workdir = tempfile.mkdtemp(prefix=f"{run_id}-")  # unique dir per run
    stack = auto.create_or_select_stack_inline(
        stack_name=run_id,
        project_name=PROJECT_NAME,
        program=make_program(run_id),
        opts=auto.LocalWorkspaceOptions(work_dir=workdir),
    )

    if fail:
        # Prove isolation: this thread fails, siblings continue
        raise RuntimeError(f"simulated failure in {run_id}")

    result = stack.up(on_output=lambda _: None)  # quiet output
    return run_id, {k: v.value for k, v in result.outputs.items()}

if __name__ == "__main__":
    jobs = [
        ("stack-A", False),
        ("stack-B", True),   # this one will fail
        ("stack-C", False),
    ]

    with ThreadPoolExecutor(max_workers=len(jobs)) as pool:
        fut_to_name = {
            pool.submit(run_stack, name, should_fail): name
            for name, should_fail in jobs
        }
        for fut in as_completed(fut_to_name):
            name = fut_to_name[fut]
            try:
                run_id, outputs = fut.result()
                print(f"[OK] {run_id}: {outputs}")
            except Exception as e:
                print(f"[FAIL] {name}: {e}")

    print("Done.")
k
We're using TypeScript. The first stack that fails will output an error log saying what went wrong, while every other stack will output an error log saying "Pulumi program failed" multiple times based on where in the sequence it is: The provided screenshots shows a stack that failed because of 2 earlier stacks
m
I am not a TypeScript expert, but maybe somebody on here can chime in. However, I think the same principles from the Python example should apply. Just with Promises. ChatGPT recommends: • Promise.allSettled ( not Promise.all ) • avoid
process.exit(1)
• give each run its own LocalWorkspace + workDir
k
Each of our programs are triggered by HTTP requests that result in their own isolated promises. We get a
Stack
object by using the following snippet
Copy code
export async function selectStack(
    projectName: string,
    stackName: string,
    program: PulumiFn,
): Promise<Stack> {
    try {
        return await LocalWorkspace.selectStack(
            {
                stackName: getStackName(stackName),
                projectName,
                program,
            }
        );
    } catch (error) {
        if (error instanceof StackNotFoundError) {
            throw new PulumiResourceNotFoundErrorResponse(
                `No record for project ${projectName} and stack ${stackName} found`,
            );
        }
        throw error;
    }
}
We do not however pass
workDir
at any point, but the docs says that it will default to a new temporary directory provided by the OS. At no point do we use process.exit as that would kill the entire API
m
That's odd. Are any provider plugins being installed during those updates? Could you share a minimal reproduction?