I’ve got a production problem with rolling deploym...
# general
i
I’ve got a production problem with rolling deployments. Our version of
api
failed to deploy (never ready), but
web
which
dependsOn: [api]
continued, and in fact it seems the pulumi command did not error in any way.
Sample code:
Copy code
const apiConfig = new Config('api')
export const api = new Api(
  {
    ...commonArgs,
    wildcardCertificate,
    replicas: apiConfig.getNumber('replicas'),
  },
  opts({ dependsOn: [apiCredentials, apiSecret, namespace, wildcardCertificate] }),
)

const webConfig = new Config('web')
export const web = new Web(
  {
    ...baseArgs,
    wildcardCertificate,
    replicas: webConfig.getNumber('replicas'),
  },
  opts({ dependsOn: [api] }),
)
I was expecting a failure to continue and a non-zero exit from pulumi
We essentially rolled out front-end code expecting the back end updated
What we got was an old version of
api
that continued to run because it was never successfully update and caused a bunch of errors due to the front end being out of sync
Is
dependsOn
just waiting on successfully updating the spec? If so, how can I wait on actually success of the
api
rollout?
w
What are
Api
and
Web
in this example? Are they custom components? What do you use in the implementation of those? (Awsx.ecs.service?)
It used to be that depending on a component silently had no effect. I believe we changed this recently to have depending on a component implicitly depend on all its children. Cc @lemon-spoon-91807.
i
My understanding is that I did not need to propagate the
dependsOn
inside the component as long as the deployment/service/ingress were all parented
I just added Web to that gist
pulumi version
"@pulumi/pulumi": "^0.17.8"
w
My understanding is that I did not need to propagate the
dependsOn
inside the component as long as the deployment/service/ingress were all parented
This certainly was not true up until recently. We did some work to try to make it true, but I recall it ran into issues and had to be rolled back. I think we landed a modified version of this recently. @lemon-spoon-91807 can hopefully confirm.
l
Hi! So, to shed some light on how pulumi currently works (i.e. if you're using very recent versions of all the pulumi packages):
It used to be that you had to explicitly depend on any custom resources that were relevant to you (i.e. that you needed to be created, but you didn't take an explicit data dependency on).
so you'd have to list all those custom resources in your
dependsOn: ...
value
with 0.17.0 and upwards we've improved this somewhat.
specifically, if you have a component resource built out of several other components and several other custom resources, then you can now just depend on that top-level compoent resource, and it will transitively pick up (through parent/child relationships) all the leaf custom resources
this is useful for a couple of cases:
1. you have something like awsx.somemod.SomeComponent. These often wrap the underlying aws.somemod.SomeComponent. It was easy to accidentally
dependsOn
the awsx component and have that be meaningless. Now, that will do "the right thing" and cause you to actually depend on the underlying aws resource.
2. it's fairly common for a component resource to represent an aggregation of resources, all of which you want to be complete before you can move forward. These include security groups, roles, loadbalancers, and the like.
Now, you can just point at the component and it will properly wait for all of these.
3. you are resilient to components changing in the future. say a component added a new child in the future, and you needed to dependOn that. you'd previously had to know that we did that, and manually update your dependsOn clause. Now it happens automatigically by you just stating your top level component dependency
i hope that helps @important-leather-28796
i
Thanks @lemon-spoon-91807
specifically, if you have a component resource built out of several other components and several other custom resources, then you can now just depend on that top-level compoent resource, and it will transitively pick up (through parent/child relationships) all the leaf custom resources
This is what did not happen. Pulumi updated the
api
spec deployment, the replica set was created but did not successfully achieve ready/live, and the
update
moved on to the rest of the components that depend on
api
. The cli returned 0. The system was stuck with the old replicaset running, the new replicaset failing, but no indication to us through pulumi.
l
can you clarify what you mean by "`api` spec deployment"?
what sort of resource is this?
l
where is teh 'replica set' you're referring to here?
thanks!
i
the replica set generated by k8s from the
api
Deployment
rolling update
l
in this code sample, i would expect that someone depending on Api would dependOn deployment/service/ingress, since htey're all parented to Api
however it would also be necessary for the k8s lib to also be using the updated pulumi/pulumi in case those types are Components as well
so that that way their underlying custom resources would be found and depended on.
also (since i don't know the k8s package) it's necessary in that package for those Components (if they are components) to properly parent themselves to their underlying children
i
so
web
dependsOn
api
, top comment
l
(like you have on in your code)
i
are you talking my code or pulumi-kubernetes
l
pulumi-kubernetes
i
ok
l
note:
i
definitely did not work as expected, and didn’t fail which is very strange. We had no idea
l
so
web
dependsOn
api
, top comment
can you link me to the link you mean?
oh, i can totally expect yu ran into a problem, and that it was 0% your fault
i
l
gotcha
what version of pulumi-kubernetes are you referencing?
i
Copy code
"@pulumi/kubernetes": "^0.22.2",
    "@pulumi/pulumi": "^0.17.8"
l
thanks, looking
so i don't know if htat pulumi/k release would have the right pulumi/pulumi ref, but i would guess it would
is your repro self contained?
i.e. i can just pull down all this code and try this out?
if so, i can file bug on this and look into this on monday
if it isn't selfcontained, that def makes it a bit more annoying, but still doable 🙂
if you can give the full repro, i'd def appreciate it
i
not so much. I can get a repo that is closer
l
it would not be hard on my end to at least see if we're collecting the full set of custom resources
if not, that's a bug on our side
if so, and it's still not working, that may be a bug elsewhere
i
I’ll work on the repro and ping you
l
ok wait...
👍 1
need clarity on something.
the replica set generated by k8s from the
api
Deployment
rolling update
so... when i look at Deployment:
it's a CustomResource
and doesn't have any child resources of its own
if you change your code to directly depend onthe Deployment is everything ok?
if so, this is def a bug with us (and more specifically me)
so yaay for hitting the right person if so
😀 1
i
didn’t try, this was a production revelation
l
yeah... if changing the dependson to the Deployment doesn't help, then this is an aspect that goes back to the K guys (i.e. not me)
i have to run. let me know if/when you have a repro
and can look this weekend
i
ok thanks
l
er, next week
i
it will be Monday
l
works for me
have a good weekend!
i
you too! thanks
@lemon-spoon-91807 - I have reproduced my situation, though I am unclear if this is a pulumi bug
l
Thanks!
i
If I use my
2.0
sample image, it exits 1 on start and pulumi behavior is as expected, api fails and web fails
If I rely on liveness/readiness, api/web deploy
l
so question from last week: if you manually change the dependsOn to the specific underlyin resource, does it then work?
i
I’ll try
l
ok. so this isn't really to the 'dependsOn' work that i did 🙂
this sounds like the underlying resource is getting created as far as it's underlying 'provider' is concerned
i
no, seems that dependsOn is wired up.
l
but it's not in a 'healthy' enough state.
i'm not a K expert
i
It is either the k8 provider should not be expected to behave this way? or k8 provider has a bug when expecting liveness to indicate ready to move along and update next
l
let's start a conversation with Alex.
i
I’m not sure it is a bug or not but I do have to solve it