Hi, we're looking into pulumi to manage our cloud...
# python
s
Hi, we're looking into pulumi to manage our cloud platform, but have some obstacles to move out of the way before we can commit to it. I hope you can help put our mind at ease with one of these obstacles: We need to deploy a set of components with cross-component dependencies in various combinations. We're looking for a way of defining all associated resources in python and having pulumi take care of dependency management, however, since python can't access attributes out of order, we'd still need to structure our code around those. As (one) solution (/workaround), we came up with the following code
Copy code
import pulumi
from pulumi_command import local
from pulumi_random import random_string


def to_output(ref: pulumi.Output, output: pulumi.Output) -> None:
    """Overwrite the reference's attributes with those of the actual resource output."""
    ref._is_known = getattr(output, "_is_known")
    ref._is_secret = getattr(output, "_is_secret")
    ref._future = getattr(output, "_future")
    ref._resources = getattr(output, "_resources")


class PlatformComponentA(pulumi.ComponentResource):
    a_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    b_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentA", "test")
        parent = pulumi.ResourceOptions(parent=self)

        a = local.Command("a", create="cat", stdin=PlatformComponentB.d_out, opts=parent)
        to_output(self.a_out, a.stdout)

        b = random_string.RandomString("b", length=8, opts=parent)
        to_output(self.b_out, b.result)

        self.register_outputs({"a": "b"})


class PlatformComponentB(pulumi.ComponentResource):
    c_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    d_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentB", "test")
        parent = pulumi.ResourceOptions(parent=self)

        c = local.Command("c", create="cat", stdin=PlatformComponentA.b_out, opts=parent)
        to_output(self.c_out, c.stdout)

        d = local.Command("d", create="date", opts=parent)
        to_output(self.d_out, d.stdout)

        self.register_outputs({})


a = PlatformComponentA()
b = PlatformComponentB()

a.a_out.apply(lambda a: print(f"a: {a}"))
a.b_out.apply(lambda b: print(f"b: {b}"))
b.c_out.apply(lambda c: print(f"c: {c}"))
b.d_out.apply(lambda d: print(f"d: {d}"))
which seems to work as expected, but has to touch
pulumi.Output
internals. Do you know of a better solution to offload the burden of dependency management from our python code to the pulumi engine? If not, do you think our solution is sound enough for production use or rather fragile? Edit: Updated the code to better reflect our issue.
f
if those A, B, C classes should be part of a single Stack (e.g. shared environment) you will want to look at Component Resources and using them. You can then make them dependent on others e.g. with
dependsOn
. If they should be separate environments, check out Stack references
s
Thanks for your response. I should have gone into some more detail, so here's my attempt: The components (
A
,
B
,
C
) will be part of single stacks (in different combinations, depending on the user's need/request). They will be implemented as `ComponentResource`s. I hadn't thought about explicitly having a
depensOn
on
ComponentResource
-level. I'm afraid we'd still need to properly sort our code, since we need to pass references to other `ComponentResource`s and python would throw an error if we tried to access something that wasn't declared. That's what I was hoping to circumvent by instantiating the placeholder
pulumi.Output
s and later filling in the "blanks". The docs you linked throw up some more questions: 1. According to this issue the
ComponentResource.register_output()
doesn't have an effect besides signalling the engine that the component has finished. In the linked docs, the component hasn't finished registering since a policy is created later on, which is an indirect child of the
ComponentResource
. 2. According to the docs on the parent attribute, `CustomResource`s should only be nested below `ComponentResource`s, not below other `CustomResource`s, which the linked example does with the
component -> bucket -> policy
relation.
👍 1
m
If C does not depend on anything, you can declare it first, and then pass it to B, and then to A:
Copy code
c = C()
b = B(stdin=c.out)
a = A(stdin=b.out)
Python prevents you from creating circular dependencies here, and Pulumi will sort out the most efficient creation/update order and parallelization. Maybe I'm missing something here, but I'd say that by definition, you cannot have circular dependencies in your infrastructure. If you can only create A when you have B, and B can only be created when you already have A, this can't possibly work. In the case that A and B have to know about each other (e.g., need to know the URL of the other party) then you'll have to create A, create B, and update A. Or you know where B will end up and can pass its URL to A even though B has not been created yet:
Copy code
url_of_b = "<http://example.com>"
a = A(other_url=url_of_b)
b = B(my_url=url_of_b, other_url=a.url)
s
Yes, we can restructure our components and order them appropriately to have some "linear" dependency graph. However, we were hoping to avoid that, since these restructured pulumi-components would be further away from the components our users order and configure. As-is, we have some components with dependencies in both directions, e.g.
Copy code
ComponentResource A:
|- CustomResource A.a (depends on B.a)
|- CustomResource A.b

ComponentResource B:
|- CustomResource B.a
|- CustomResource B.b (depends on A.b)
According to the documentation linked above, we could avoid breaking up these components, if we delayed declaring the resources with unmet dependencies (here
A.a
if we first initialized
A
, then
B
in code) by moving their definition from the
ComponentResource.__init__
into some other method and calling that later on. This would still mean a lot of work to restructure our components - which I'd like to avoid - and I'm not even sure whether that'd be a good idea, since the docs state that > The call to
registerOutputs
also tells Pulumi that the resource is done registering children and should be considered fully constructed
m
I think what you're trying to do is not compatible with the Pulumi resource and component model. It looks like you don't have clear boundaries between your component resources but divide them based on some other, higher-level considerations? I believe you'll either have to merge A and B into a single component resource AB, or find a way to share the information that you need through a third component, e.g., a configuration that doesn't necessarily have to reflect any "real" infrastructure.
f
my current understanding is
registerOutputs
is used solely to update the CLI's progress bar unless you're working with multi-language components. it's not required but it's good for future-proofing, i suppose? yeah i'm curious what these circular dependencies are. it feels like something that can be architected around. have you looked at the blog post on circular dependencies in Pulumi? maybe something there will inspire💡
s
I stumbled over that blog article earlier, when looking for another solution. That covers two custom resources which each can't be deployed completely without the other one having been deployed (if I understood correctly), which isn't a problem we're facing. In our case, pulumi could always construct a clear dependency graph (without loops). I updated the original code snippet to better resemble our situation, copying it here for simplicity:
Copy code
import pulumi
from pulumi_command import local
from pulumi_random import random_string


def to_output(ref: pulumi.Output, output: pulumi.Output) -> None:
    """Overwrite the reference's attributes with those of the actual resource output."""
    ref._is_known = getattr(output, "_is_known")
    ref._is_secret = getattr(output, "_is_secret")
    ref._future = getattr(output, "_future")
    ref._resources = getattr(output, "_resources")


class PlatformComponentA(pulumi.ComponentResource):
    a_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    b_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentA", "test")
        parent = pulumi.ResourceOptions(parent=self)

        a = local.Command("a", create="cat", stdin=PlatformComponentB.d_out, opts=parent)
        to_output(self.a_out, a.stdout)

        b = random_string.RandomString("b", length=8, opts=parent)
        to_output(self.b_out, b.result)

        self.register_outputs({"a": "b"})


class PlatformComponentB(pulumi.ComponentResource):
    c_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    d_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentB", "test")
        parent = pulumi.ResourceOptions(parent=self)

        c = local.Command("c", create="cat", stdin=PlatformComponentA.b_out, opts=parent)
        to_output(self.c_out, c.stdout)

        d = local.Command("d", create="date", opts=parent)
        to_output(self.d_out, d.stdout)

        self.register_outputs({})


a = PlatformComponentA()
b = PlatformComponentB()

a.a_out.apply(lambda a: print(f"a: {a}"))
a.b_out.apply(lambda b: print(f"b: {b}"))
b.c_out.apply(lambda c: print(f"c: {c}"))
b.d_out.apply(lambda d: print(f"d: {d}"))
This works as "expected": We can scope the pulumi-components around our "actual" platform-components, Python is satisfied with all attributes being present upfront and pulumi is able to track all dependencies and deploy the resources with correct inputs/outputs. It just feels somewhat hacky and might be brittle, but I'm too much of a novice with pulumi's internae to know for sure. > I think what you're trying to do is not compatible with the Pulumi resource and component model. Yeah, this could well be true. We had issues with manually tracking inter-component dependencies and restructuring components in the past, so one of our goals for an improved iac toolchain was to let the "engine" keep track of these inter-component dependencies. > I believe you'll either have to merge A and B into a single component resource AB, or find a way to share the information that you need through a third component, e.g., a configuration that doesn't necessarily have to reflect any "real" infrastructure. For some of the inter-component dependencies that would be a solution, but not all of them. One instance that I stumbled over when test-porting some parts to pulumi was an azure log-analytics workspace and associated diagnostic settings, on various azure resources, that are created conditionally based on a config. These diagnostic settings are sprinkled throughout other components, but require the log-analytics workspace-id as input.
m
For some of the inter-component dependencies that would be a solution, but not all of them.
But how do you create them manually? Usually (and I've yet to see an example where this does not apply) you can create any infrastructure through a sequence of CLI calls (or mouse clicks). This might be very painful and cumbersome, but there has to be a way to do it step by step. One common example of a circular dependency is linking Kubernetes RBAC service accounts with IAM roles on AWS, where the service account has to be annotated with the role ARN and the role's trust policy contains the name and namespace of the service account. You can either create this by "manually" assembling either the role ARN or the service account's name ahead of resource creation, or you make the service account first, then create the IAM role, and then patch the service account (which is an example of option 3 in the blog post).
s
On second look I couldn't find a platform-component where extracting dependents into separate ComponentResources wouldn't be possible, so your right in that regard. It would lead to many AB-, ABC- or even ABCD-type ComponentResources, thought. That'd be a bad solution for us, since our users can order components individually, so user1 might have platform-component A, user2 B and user3 A and B. If we now extracted parts of A and B into AB, we'd need to add checks within AB to know which subset of resources needs to be deployed there. Additionally, the operative burden of reviewing diffs would be increased by splitting up the components (and merging them partially), since an order of A would result in a diff for ComponentResource A and AB . Nonetheless, thanks for looking into my issue and coming up with valid solutions. If "stable" offloading of dependency tracking to the pulumi engine isn't possible, we'll need to go back to the drawing board, I'm afraid.
m
It would lead to many AB-, ABC- or even ABCD-type ComponentResources, thought. That'd be a bad solution for us, since our users can order components individually, so user1 might have platform-component A, user2 B and user3 A and B. If we now extracted parts of A and B into AB, we'd need to add checks within AB to know which subset of resources needs to be deployed there.
I obviously have no insight into what you are doing and how your platform is architected, but just based on this description I think you might be missing a layer in-between: Why does what your users order have to be reflected on the level of resources? Just as some further food for thought. Let's say I'm selling an accounting backend in various different variants. It consists of a database, disks, VMs, an authentication service etc. From the perspective of the user, they'll book "High-throughput database" but that does not mean that I need to have a HighThroughputDatabaseComponent I deploy for them. I'd probably have a DatabaseComponent and then link this to a IOOptimizedDisk, whereas usually I'd link it to a RegularDisk. To be able to conveniently deploy different variants, I'd organize them into different Pulumi programs (or parametrize the programs) so that I deploy one stack per user: Alice gets her
alice
stack with the high-throughput database and Bob get his
bob
stack with the regular version. Neither of them has to care how I model my resources.
s
The components our users can order are e.g. a kubernetes namespace, airflow, a databricks workspace, and so on. These components are currently not parametrized. We planned to manage all of a user's resources in a one stack per stage/env (multiple stacks still require explicit resolution of dependencies and result in circular dependencies between stacks in the example above) within one pulumi project per user.
Why does what your users order have to be reflected on the level of resources?
I know it doesn't necessarily. But it will be tough selling our team a different solution since it's very convenient both for our developers (there's a clear grouping of all resources and - unless replacements are required - changes to platform-component A will only affect resources within "platformcomponentsA") and our ops team (user1 orders platform-component A -> ops team enables platform-component A for user1, checks the diff -> sees changes only to "platformcomponentA" -> approves the preview/plan).
f
Ah, thank you for the use case - I feel like we can get you there, albeit with a few PoC iterations 😄
👀 2
So that works, given your use case, but the glaring hack there is
PlatformComponentBNeedsAShell
- to trick out the circular dep. Python doesn't really care the types don't match, so that's a 😅 for python if you're into that. 😄
Also, I parameterized the ComponentResources so the deps are handled
That's one option though. If you're doin' a la carte customer features with dependencies like this with complex matrixes, I'm not sure I'd do exactly this. I'd have to have more info and think about it before committing 🙂 In particular, this would get messy and not scale well if the product matrix changes often (as well as handling updates to components so as not to affect existing customers)
m
@future-hairdresser-70637 Why/how does this work? Shouldn't a still depend on the
PlatformComponentBNeedsAShell
instance?
Copy code
b = PlatformComponentBNeedsAShell("databricks_shell", "databricks_shell")
a = PlatformComponentANeedsB("k8s", PlatformComponentANeedsBArgs(B_d_out=b.d_out))
b = PlatformComponentBNeedsA("databricks", PlatformComponentBNeedsAArgs(A_b_out=a.b_out))
If you assign
b
to a new object, the initial
b
that you passed to
a
still points to the original object, no?
f
Coming from mostly C#, I was slightly surprised/horrified it worked as well, but it did 😄 it's all about that "shell" type and python not caring so much about types
nope, overwritten, and pulumi only cares about the end state to generate the DAG
m
Sorry, I submitted too early, please see my full question
f
np
so the "end state" sees the objects we want,
PlatformComponentANeedsB
and
PlatformComponentBNeedsA
PlatformComponentBNeedsAShell
is gone, overwritten
now, yes, this is a disgusting hack to me 😆
m
Yes, I think it's more of a question why Python and/or Pulumi behaves this way. Because the
b
that is used in the
a=
line is pointing to a different object than the object
b
is assigned to eventually. But I think I need to do some experimenting.
f
yeah for sure! if you have time, try that out on your machine and let me know - maybe my setup is weird, but I think it's legit
it's exploiting poor python
I don't know if we mentioned this issue in the thread here but Manage circular dependencies by modelling initial resource state is the latest proposal to address this scenario. No idea about timelines for this though.
👀 2
s
Thanks for your solution. While the requirements/dependencies are clearer in your approach, I find it puzzling - even more so than my initial approach - that it actually works. That aside, both these approaches hinge on us overwriting the internals of a
pulumi.Output
without losing the reference to the associated object. Do you think it makes sense for me to request a feature of this
to_output
being - somehow - implemented as a method of
pulumi.Output
? That doesn't sound like too much of a chore for the pulumi devs and would allow users - at least in python - to model these somewhat complex dependencies.
f
👍 I encourage anyone to file feature requests (and bugs)! At the very worst - which isn't bad - the request is seen and not acted upon by anyone. But someone will see it, think about it, and it'll influence them in some tiny way.