Hi we re looking into pulumi to manage our cloud platform bu Pulumi Community #python

Hi, we're looking into pulumi to manage our cloud...

sparse-alarm-54651

07/23/2024, 3:04 PM

Hi, we're looking into pulumi to manage our cloud platform, but have some obstacles to move out of the way before we can commit to it. I hope you can help put our mind at ease with one of these obstacles: We need to deploy a set of components with cross-component dependencies in various combinations. We're looking for a way of defining all associated resources in python and having pulumi take care of dependency management, however, since python can't access attributes out of order, we'd still need to structure our code around those. As (one) solution (/workaround), we came up with the following code

Copy code

import pulumi
from pulumi_command import local
from pulumi_random import random_string


def to_output(ref: pulumi.Output, output: pulumi.Output) -> None:
    """Overwrite the reference's attributes with those of the actual resource output."""
    ref._is_known = getattr(output, "_is_known")
    ref._is_secret = getattr(output, "_is_secret")
    ref._future = getattr(output, "_future")
    ref._resources = getattr(output, "_resources")


class PlatformComponentA(pulumi.ComponentResource):
    a_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    b_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentA", "test")
        parent = pulumi.ResourceOptions(parent=self)

        a = local.Command("a", create="cat", stdin=PlatformComponentB.d_out, opts=parent)
        to_output(self.a_out, a.stdout)

        b = random_string.RandomString("b", length=8, opts=parent)
        to_output(self.b_out, b.result)

        self.register_outputs({"a": "b"})


class PlatformComponentB(pulumi.ComponentResource):
    c_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    d_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentB", "test")
        parent = pulumi.ResourceOptions(parent=self)

        c = local.Command("c", create="cat", stdin=PlatformComponentA.b_out, opts=parent)
        to_output(self.c_out, c.stdout)

        d = local.Command("d", create="date", opts=parent)
        to_output(self.d_out, d.stdout)

        self.register_outputs({})


a = PlatformComponentA()
b = PlatformComponentB()

a.a_out.apply(lambda a: print(f"a: {a}"))
a.b_out.apply(lambda b: print(f"b: {b}"))
b.c_out.apply(lambda c: print(f"c: {c}"))
b.d_out.apply(lambda d: print(f"d: {d}"))

which seems to work as expected, but has to touch

pulumi.Output

internals. Do you know of a better solution to offload the burden of dependency management from our python code to the pulumi engine? If not, do you think our solution is sound enough for production use or rather fragile? Edit: Updated the code to better reflect our issue.

future-hairdresser-70637

07/23/2024, 3:31 PM

if those A, B, C classes should be part of a single Stack (e.g. shared environment) you will want to look at Component Resources and using them. You can then make them dependent on others e.g. with

dependsOn

. If they should be separate environments, check out Stack references

sparse-alarm-54651

07/23/2024, 3:57 PM

Thanks for your response. I should have gone into some more detail, so here's my attempt: The components (

) will be part of single stacks (in different combinations, depending on the user's need/request). They will be implemented as `ComponentResource`s. I hadn't thought about explicitly having a

depensOn

ComponentResource

-level. I'm afraid we'd still need to properly sort our code, since we need to pass references to other `ComponentResource`s and python would throw an error if we tried to access something that wasn't declared. That's what I was hoping to circumvent by instantiating the placeholder

pulumi.Output

s and later filling in the "blanks". The docs you linked throw up some more questions: 1. According to this issue the

ComponentResource.register_output()

doesn't have an effect besides signalling the engine that the component has finished. In the linked docs, the component hasn't finished registering since a policy is created later on, which is an indirect child of the

ComponentResource

. 2. According to the docs on the parent attribute, `CustomResource`s should only be nested below `ComponentResource`s, not below other `CustomResource`s, which the linked example does with the

component -> bucket -> policy

relation.

👍 1

modern-zebra-45309

07/23/2024, 4:13 PM

If C does not depend on anything, you can declare it first, and then pass it to B, and then to A:

Copy code

c = C()
b = B(stdin=c.out)
a = A(stdin=b.out)

Python prevents you from creating circular dependencies here, and Pulumi will sort out the most efficient creation/update order and parallelization. Maybe I'm missing something here, but I'd say that by definition, you cannot have circular dependencies in your infrastructure. If you can only create A when you have B, and B can only be created when you already have A, this can't possibly work. In the case that A and B have to know about each other (e.g., need to know the URL of the other party) then you'll have to create A, create B, and update A. Or you know where B will end up and can pass its URL to A even though B has not been created yet:

Copy code

url_of_b = "<http://example.com>"
a = A(other_url=url_of_b)
b = B(my_url=url_of_b, other_url=a.url)

sparse-alarm-54651

07/23/2024, 5:06 PM

Yes, we can restructure our components and order them appropriately to have some "linear" dependency graph. However, we were hoping to avoid that, since these restructured pulumi-components would be further away from the components our users order and configure. As-is, we have some components with dependencies in both directions, e.g.

Copy code

ComponentResource A:
|- CustomResource A.a (depends on B.a)
|- CustomResource A.b

ComponentResource B:
|- CustomResource B.a
|- CustomResource B.b (depends on A.b)

According to the documentation linked above, we could avoid breaking up these components, if we delayed declaring the resources with unmet dependencies (here

A.a

if we first initialized

, then

in code) by moving their definition from the

ComponentResource.__init__

into some other method and calling that later on. This would still mean a lot of work to restructure our components - which I'd like to avoid - and I'm not even sure whether that'd be a good idea, since the docs state that > The call to

registerOutputs

also tells Pulumi that the resource is done registering children and should be considered fully constructed

modern-zebra-45309

07/23/2024, 5:12 PM

I think what you're trying to do is not compatible with the Pulumi resource and component model. It looks like you don't have clear boundaries between your component resources but divide them based on some other, higher-level considerations? I believe you'll either have to merge A and B into a single component resource AB, or find a way to share the information that you need through a third component, e.g., a configuration that doesn't necessarily have to reflect any "real" infrastructure.

future-hairdresser-70637

07/23/2024, 6:33 PM

my current understanding is

registerOutputs

is used solely to update the CLI's progress bar unless you're working with multi-language components. it's not required but it's good for future-proofing, i suppose? yeah i'm curious what these circular dependencies are. it feels like something that can be architected around. have you looked at the blog post on circular dependencies in Pulumi? maybe something there will inspire💡

sparse-alarm-54651

07/23/2024, 7:28 PM

I stumbled over that blog article earlier, when looking for another solution. That covers two custom resources which each can't be deployed completely without the other one having been deployed (if I understood correctly), which isn't a problem we're facing. In our case, pulumi could always construct a clear dependency graph (without loops). I updated the original code snippet to better resemble our situation, copying it here for simplicity:

Copy code

import pulumi
from pulumi_command import local
from pulumi_random import random_string


def to_output(ref: pulumi.Output, output: pulumi.Output) -> None:
    """Overwrite the reference's attributes with those of the actual resource output."""
    ref._is_known = getattr(output, "_is_known")
    ref._is_secret = getattr(output, "_is_secret")
    ref._future = getattr(output, "_future")
    ref._resources = getattr(output, "_resources")


class PlatformComponentA(pulumi.ComponentResource):
    a_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    b_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentA", "test")
        parent = pulumi.ResourceOptions(parent=self)

        a = local.Command("a", create="cat", stdin=PlatformComponentB.d_out, opts=parent)
        to_output(self.a_out, a.stdout)

        b = random_string.RandomString("b", length=8, opts=parent)
        to_output(self.b_out, b.result)

        self.register_outputs({"a": "b"})


class PlatformComponentB(pulumi.ComponentResource):
    c_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)
    d_out: pulumi.Output[str | None] = pulumi.Output.from_input(None)

    def __init__(self):
        super().__init__("component:PlatformComponentB", "test")
        parent = pulumi.ResourceOptions(parent=self)

        c = local.Command("c", create="cat", stdin=PlatformComponentA.b_out, opts=parent)
        to_output(self.c_out, c.stdout)

        d = local.Command("d", create="date", opts=parent)
        to_output(self.d_out, d.stdout)

        self.register_outputs({})


a = PlatformComponentA()
b = PlatformComponentB()

a.a_out.apply(lambda a: print(f"a: {a}"))
a.b_out.apply(lambda b: print(f"b: {b}"))
b.c_out.apply(lambda c: print(f"c: {c}"))
b.d_out.apply(lambda d: print(f"d: {d}"))

This works as "expected": We can scope the pulumi-components around our "actual" platform-components, Python is satisfied with all attributes being present upfront and pulumi is able to track all dependencies and deploy the resources with correct inputs/outputs. It just feels somewhat hacky and might be brittle, but I'm too much of a novice with pulumi's internae to know for sure. > I think what you're trying to do is not compatible with the Pulumi resource and component model. Yeah, this could well be true. We had issues with manually tracking inter-component dependencies and restructuring components in the past, so one of our goals for an improved iac toolchain was to let the "engine" keep track of these inter-component dependencies. > I believe you'll either have to merge A and B into a single component resource AB, or find a way to share the information that you need through a third component, e.g., a configuration that doesn't necessarily have to reflect any "real" infrastructure. For some of the inter-component dependencies that would be a solution, but not all of them. One instance that I stumbled over when test-porting some parts to pulumi was an azure log-analytics workspace and associated diagnostic settings, on various azure resources, that are created conditionally based on a config. These diagnostic settings are sprinkled throughout other components, but require the log-analytics workspace-id as input.

modern-zebra-45309

07/23/2024, 8:04 PM

For some of the inter-component dependencies that would be a solution, but not all of them.

But how do you create them manually? Usually (and I've yet to see an example where this does not apply) you can create any infrastructure through a sequence of CLI calls (or mouse clicks). This might be very painful and cumbersome, but there has to be a way to do it step by step. One common example of a circular dependency is linking Kubernetes RBAC service accounts with IAM roles on AWS, where the service account has to be annotated with the role ARN and the role's trust policy contains the name and namespace of the service account. You can either create this by "manually" assembling either the role ARN or the service account's name ahead of resource creation, or you make the service account first, then create the IAM role, and then patch the service account (which is an example of option 3 in the blog post).

sparse-alarm-54651

07/24/2024, 5:45 AM

On second look I couldn't find a platform-component where extracting dependents into separate ComponentResources wouldn't be possible, so your right in that regard. It would lead to many AB-, ABC- or even ABCD-type ComponentResources, thought. That'd be a bad solution for us, since our users can order components individually, so user1 might have platform-component A, user2 B and user3 A and B. If we now extracted parts of A and B into AB, we'd need to add checks within AB to know which subset of resources needs to be deployed there. Additionally, the operative burden of reviewing diffs would be increased by splitting up the components (and merging them partially), since an order of A would result in a diff for ComponentResource A and AB . Nonetheless, thanks for looking into my issue and coming up with valid solutions. If "stable" offloading of dependency tracking to the pulumi engine isn't possible, we'll need to go back to the drawing board, I'm afraid.

modern-zebra-45309

07/24/2024, 8:06 AM

It would lead to many AB-, ABC- or even ABCD-type ComponentResources, thought. That'd be a bad solution for us, since our users can order components individually, so user1 might have platform-component A, user2 B and user3 A and B. If we now extracted parts of A and B into AB, we'd need to add checks within AB to know which subset of resources needs to be deployed there.

I obviously have no insight into what you are doing and how your platform is architected, but just based on this description I think you might be missing a layer in-between: Why does what your users order have to be reflected on the level of resources? Just as some further food for thought. Let's say I'm selling an accounting backend in various different variants. It consists of a database, disks, VMs, an authentication service etc. From the perspective of the user, they'll book "High-throughput database" but that does not mean that I need to have a HighThroughputDatabaseComponent I deploy for them. I'd probably have a DatabaseComponent and then link this to a IOOptimizedDisk, whereas usually I'd link it to a RegularDisk. To be able to conveniently deploy different variants, I'd organize them into different Pulumi programs (or parametrize the programs) so that I deploy one stack per user: Alice gets her

alice

stack with the high-throughput database and Bob get his

bob

stack with the regular version. Neither of them has to care how I model my resources.

sparse-alarm-54651

07/24/2024, 10:01 AM

The components our users can order are e.g. a kubernetes namespace, airflow, a databricks workspace, and so on. These components are currently not parametrized. We planned to manage all of a user's resources in a one stack per stage/env (multiple stacks still require explicit resolution of dependencies and result in circular dependencies between stacks in the example above) within one pulumi project per user.

Why does what your users order have to be reflected on the level of resources?

I know it doesn't necessarily. But it will be tough selling our team a different solution since it's very convenient both for our developers (there's a clear grouping of all resources and - unless replacements are required - changes to platform-component A will only affect resources within "platformcomponentsA") and our ops team (user1 orders platform-component A -> ops team enables platform-component A for user1, checks the diff -> sees changes only to "platformcomponentA" -> approves the preview/plan).

future-hairdresser-70637

07/24/2024, 12:36 PM

Ah, thank you for the use case - I feel like we can get you there, albeit with a few PoC iterations 😄

Untitled.py

👀 2

future-hairdresser-70637

07/24/2024, 12:38 PM

So that works, given your use case, but the glaring hack there is

PlatformComponentBNeedsAShell

- to trick out the circular dep. Python doesn't really care the types don't match, so that's a 😅 ➕ for python if you're into that. 😄

future-hairdresser-70637

07/24/2024, 12:39 PM

Also, I parameterized the ComponentResources so the deps are handled

future-hairdresser-70637

07/24/2024, 12:43 PM

That's one option though. If you're doin' a la carte customer features with dependencies like this with complex matrixes, I'm not sure I'd do exactly this. I'd have to have more info and think about it before committing 🙂 In particular, this would get messy and not scale well if the product matrix changes often (as well as handling updates to components so as not to affect existing customers)

modern-zebra-45309

07/24/2024, 12:48 PM

@future-hairdresser-70637 Why/how does this work? Shouldn't a still depend on the

PlatformComponentBNeedsAShell

instance?

Copy code

b = PlatformComponentBNeedsAShell("databricks_shell", "databricks_shell")
a = PlatformComponentANeedsB("k8s", PlatformComponentANeedsBArgs(B_d_out=b.d_out))
b = PlatformComponentBNeedsA("databricks", PlatformComponentBNeedsAArgs(A_b_out=a.b_out))

If you assign

to a new object, the initial

that you passed to

still points to the original object, no?

future-hairdresser-70637

07/24/2024, 12:49 PM

Coming from mostly C#, I was slightly surprised/horrified it worked as well, but it did 😄 it's all about that "shell" type and python not caring so much about types

future-hairdresser-70637

07/24/2024, 12:50 PM

nope, overwritten, and pulumi only cares about the end state to generate the DAG

modern-zebra-45309

07/24/2024, 12:50 PM

Sorry, I submitted too early, please see my full question

future-hairdresser-70637

07/24/2024, 12:50 PM

future-hairdresser-70637

07/24/2024, 12:50 PM

so the "end state" sees the objects we want,

PlatformComponentANeedsB

and

PlatformComponentBNeedsA

future-hairdresser-70637

07/24/2024, 12:51 PM

PlatformComponentBNeedsAShell

is gone, overwritten

future-hairdresser-70637

07/24/2024, 12:54 PM

now, yes, this is a disgusting hack to me 😆

modern-zebra-45309

07/24/2024, 12:54 PM

Yes, I think it's more of a question why Python and/or Pulumi behaves this way. Because the

that is used in the

a=

line is pointing to a different object than the object

is assigned to eventually. But I think I need to do some experimenting.

future-hairdresser-70637

07/24/2024, 12:55 PM

yeah for sure! if you have time, try that out on your machine and let me know - maybe my setup is weird, but I think it's legit

future-hairdresser-70637

07/24/2024, 12:55 PM

it's exploiting poor python

future-hairdresser-70637

07/24/2024, 3:25 PM

I don't know if we mentioned this issue in the thread here but Manage circular dependencies by modelling initial resource state is the latest proposal to address this scenario. No idea about timelines for this though.

👀 2

sparse-alarm-54651

08/01/2024, 11:04 AM

Thanks for your solution. While the requirements/dependencies are clearer in your approach, I find it puzzling - even more so than my initial approach - that it actually works. That aside, both these approaches hinge on us overwriting the internals of a

pulumi.Output

without losing the reference to the associated object. Do you think it makes sense for me to request a feature of this

to_output

being - somehow - implemented as a method of

pulumi.Output

? That doesn't sound like too much of a chore for the pulumi devs and would allow users - at least in python - to model these somewhat complex dependencies.

future-hairdresser-70637

08/01/2024, 12:29 PM

👍 I encourage anyone to file feature requests (and bugs)! At the very worst - which isn't bad - the request is seen and not acted upon by anyone. But someone will see it, think about it, and it'll influence them in some tiny way.

15 Views

Open in Slack

Previous Next