Hi, using pulumi (and .net) and resource resolves ...
# general
d
Hi, using pulumi (and .net) and resource resolves has been quite a pain for us, - currently we are now seeing issues in code that hasn't been changed in ~ 1½ years, but all of a sudden some resources are not resolved properly, even within output.all output.tuple. we've even attempted to provoke it by slowing down the code with random sleeps, console outputs etc. and does indeed seem to work. so theres some sort of "race-condition" regarding resource resolving thats not working correctly. would love some feedback on the issue.
essentially we are creating a EKSCluster with a custom class extending from ComponentResource. within the constructor policies and cluster is created. additionally a kubeconfig is also created, and its this kubeconfig that is pulling cluster variables and some arns - that are null
reason to why we are revisiting the code is that we needed to upgrade the eks cluster to a supporter cluster version. and it seems? - that some resources are not resolved properly now. the original developer has apparently been lucky that the resources were indeed resolved.
we do have plans on rewriting this setup entirely - to use the actual ekscluster provider, instead of this custom component. but for now we are in a jam - and need some advice on what we can do here
constructor + some global vars
lookuproles function:
the function in which resources must be resolved for generating kubeconfigs
e
that some resources are not resolved properly now.
Not totally sure what you mean by this? Are you saying that the Outputs from resources are coming back null?
d
yup
e
That the Output<T> object itself is null or that the value inside it seen during apply is null?
d
sometimes its null (nullexception and pulumi halts) - sometimes it goes through
let me get the last stacktrace sec..
Copy code
2022-06-09T07:18:01.5820701Z Diagnostics:
2022-06-09T07:18:01.5834051Z   pulumi:pulumi:Stack (DevSecOps.NgdpPlatform-ci):
2022-06-09T07:18:01.5835107Z     error: Running program '/work/src/ApplicationPlatform/bin/Debug/netcoreapp3.1/ApplicationPlatform.dll' failed with an unhandled exception:
2022-06-09T07:18:01.5835788Z     System.NullReferenceException: Object reference not set to an instance of an object.
2022-06-09T07:18:01.5836668Z        at void ApplicationPlatform.CaaS.Cluster.EksCluster.CreateKubeConfig(string name)+(ValueTuple<ClusterCertificateAuthority, string, string, GetRoleResult> c) => { } [5] in /work/src/ApplicationPlatform/CaaS/Cluster/EksCluster.cs:line 352
2022-06-09T07:18:01.5837530Z        at Output<U> Pulumi.Output<T>.Apply<U>(Func<T, U> func)+(T t) => { }
2022-06-09T07:18:01.5840301Z        at async Task<OutputData<U>> Pulumi.Output<T>.ApplyHelperAsync<U>(Task<OutputData<T>> dataTask, Func<T, Output<U>> func)
2022-06-09T07:18:01.5841455Z     error: Running program '/work/src/ApplicationPlatform/bin/Debug/netcoreapp3.1/ApplicationPlatform.dll' failed with an unhandled exception:
2022-06-09T07:18:01.5842151Z     System.NullReferenceException: Object reference not set to an instance of an object.
2022-06-09T07:18:01.5842976Z        at void ApplicationPlatform.CaaS.Cluster.EksCluster.CreateKubeConfig(string name)+(ValueTuple<ClusterCertificateAuthority, string, string> c) => { } [4] in /work/src/ApplicationPlatform/CaaS/Cluster/EksCluster.cs:line 294
2022-06-09T07:18:01.5843783Z        at Output<U> Pulumi.Output<T>.Apply<U>(Func<T, U> func)+(T t) => { }
2022-06-09T07:18:01.5844350Z        at async Task<OutputData<U>> Pulumi.Output<T>.ApplyHelperAsync<U>(Task<OutputData<T>> dataTask, Func<T, Output<U>> func)
2022-06-09T07:18:01.5844970Z        at async Task<OutputData<object>> Pulumi.Output<T>.Pulumi.IOutput.GetDataAsync()
2022-06-09T07:18:01.5845619Z        at async Task<object> Pulumi.Serialization.Serializer.SerializeAsync(string ctx, object prop, bool keepResources, bool keepOutputValues) x 2
2022-06-09T07:18:01.5846550Z        at async Task<RawSerializationResult> Pulumi.Deployment.SerializeFilteredPropertiesRawAsync(string label, IDictionary<string, object> args, Predicate<string> acceptKey, bool keepResources, bool keepOutputValues)
2022-06-09T07:18:01.5847615Z        at async Task<SerializationResult> Pulumi.Deployment.SerializeFilteredPropertiesAsync(string label, IDictionary<string, object> args, Predicate<string> acceptKey, bool keepResources, bool keepOutputValues)
2022-06-09T07:18:01.5848839Z        at async Task<PrepareResult> Pulumi.Deployment.PrepareResourceAsync(string label, Resource res, bool custom, bool remote, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5850085Z        at async Task<(string urn, string id, Struct data, ImmutableDictionary<string, ImmutableHashSet<Resource>> dependencies)> Pulumi.Deployment.RegisterResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5851411Z        at async Task<(string urn, string id, Struct data, ImmutableDictionary<string, ImmutableHashSet<Resource>> dependencies)> Pulumi.Deployment.ReadOrRegisterResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5852655Z        at async Task Pulumi.Deployment.CompleteResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options, ImmutableDictionary<string, IOutputCompletionSource> completionSources)
2022-06-09T07:18:01.5853500Z        at async Task<T> Pulumi.Output<T>.GetValueAsync(T whenUnknown)
2022-06-09T07:18:01.5854100Z        at async Task<HashSet<string>> Pulumi.Deployment.GetAllTransitivelyReferencedResourceUrnsAsync(HashSet<Resource> resources)
2022-06-09T07:18:01.5854886Z        at async Task<PrepareResult> Pulumi.Deployment.PrepareResourceAsync(string label, Resource res, bool custom, bool remote, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5855982Z        at async Task<(string urn, string id, Struct data, ImmutableDictionary<string, ImmutableHashSet<Resource>> dependencies)> Pulumi.Deployment.RegisterResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5857286Z        at async Task<(string urn, string id, Struct data, ImmutableDictionary<string, ImmutableHashSet<Resource>> dependencies)> Pulumi.Deployment.ReadOrRegisterResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options)
2022-06-09T07:18:01.5858554Z        at async Task Pulumi.Deployment.CompleteResourceAsync(Resource resource, bool remote, Func<string, Resource> newDependency, ResourceArgs args, ResourceOptions options, ImmutableDictionary<string, IOutputCompletionSource> completionSources)
----- SNIP ------
2022-06-09T07:18:01.5903204Z        at async Task<T> Pulumi.Output<T>.GetValueAsync(T whenUnknown)
2022-06-09T07:18:01.5903733Z        at async Task<string> Pulumi.Deployment+EngineLogger.TryGetResourceUrnAsync(Resource resource)
2022-06-09T07:18:01.5904150Z
e
Odd. Why are you wrapping single outputs in
Output.All
though? That seems pointless.
d
indeed, however it seemed to work for a bit
alas, the error is sporadic
ie. some sort of race condition.. or we are doing something wrong here
funny enough, while debugging i saw that if i put in a Console.WriteLine("wahteverdebugtext")... because i was getting desperate - that also seemed to make the error happen less
ie.. again pointing to some parallel resolve not going through properly
e
well Output.All for one output is odd but it should still work
d
hence why 🙂
e
I'd remove the Output.All and if you see the error again raise an issue with the stack trace at github.com/pulumi/pulumi
I'll have a pass through the SDK code and see if I can find anything suspect
d
running it without the initial output.all's does nothing, and just brings us back to how the code was written before 🙂
so in essence they DID DO something, but the problem is still there, and ultimately didn't solve the problem. i guess it worked briefly - because of some timing being shifted around abit
could it somehow be related to the fact that the code is attempting to resolve outputs inside a componentresource ?
e
could it somehow be related to the fact that the code is attempting to resolve outputs inside a componentresource ?
I don't think that should matter.
d
do we have some sort of Promise().resolve(component1,component2,...) option to ensure resources are indeed resolved ? - ie. wait for component to be populated ?
or.. would it make sense to just make a ugly try/catch - retry/wait loop 😞 ...
e
So I think there's two things here: 1. What exactly you mean by resolved. 2. If you mean custom or component resources
1. Resolved to Pulumi just means that it has a value for that property. But for example some resource will return their ID and other outputs but they won't actually be ready for use yet. So you might try to get pulumi to make a resource, then look up it's ID using a cloud sdk and not find it. 2. Custom resources should always resolve (see above) correctly because their properties being set is part of Pulumis core SDK and engine. Component resources will only resolve correctly if the programmer who wrote them wrote them correctly, their correctness is outside of Pulumis control and there isn't anything we can do to help with that.
d
so basically the resource is not resolving.. thats the whole problem
and yeah we get that the original developer wasn't aware that the resources are actually not being resolved (or at least some sort of race condition) - when accessed
but it is using pulumis recommended way of writing resources, - unless theres a specific issue regarding the way we are doing it
so in the situation where a resource is not resolved, do you have a workaround for testing or making sure the resource is indeed resolved - before accessing its property.
or is it simply considered a "bug".. and we will have to figure out a way around it ourselves by hacking around it
e
or is it simply considered a "bug".. and we will have to figure out a way around it ourselves by hacking around it
Yup, unfortunately. I mean the issue is an object property isn't set when you expect it to be set. I think you could write a loop to keep trying to read the property until it comes back non-null, and then use that but I don't think there's much value in trying to write a helper function in the pulumi SDK to do that.
d
k ill try some hack/workaround - and if that does indeed work - then ill create a ticket with the issue, and the workaround/fix for someone to dig into
e
👍 and if do find any instances of properties being null on a Pulumi resource (like our aws/azure resources) do raise a bug on us for it. Properties should always be set to a real Output<T> object you shouldn't ever be seeing null outputs.
👍 1
d
yup - a horrible try/catch- retry loop seems to fix the issue 😕
Copy code
int retrymax=10;
            int retrycounter=0;
            bool annoyingresourcehack=true;
            while(annoyingresourcehack){
                try{
                    UserKubeConfig = Output.Tuple(certauth, endpoint, arn).Apply(c =>
                    {
                        var k8SConfiguration = new K8SConfiguration
                        {
                            Clusters =
                                new List<k8s.KubeConfigModels.Cluster>
                                {
                                    new k8s.KubeConfigModels.Cluster
                                    {
                                        Name = c.Item3,
                                        ClusterEndpoint = new ClusterEndpoint
                                        {
                                            Server = c.Item2, CertificateAuthorityData = c.Item1.Data
                                        }
                                    }
                                },
                            Users = new List<User>
                            {
                                new User
                                {
                                    Name = c.Item3,
                                    UserCredentials = new UserCredentials
                                    {
                                        ExternalExecution = new ExternalExecution
                                        {
                                            ApiVersion = "<http://client.authentication.k8s.io/v1alpha1|client.authentication.k8s.io/v1alpha1>",
                                            Command = "aws",
                                            Arguments = new[]
                                            {
                                                "--region", Config.Region,
                                                "eks", "get-token", "--cluster-name", name
                                            }
                                        }
                                    }
                                }
                            },
                            Contexts = new List<Context>
                            {
                                new Context
                                {
                                    Name = c.Item3,
                                    ContextDetails = new ContextDetails
                                    {
                                        User = c.Item3,
                                        Cluster = c.Item3
                                    }
                                }
                            },
                            ApiVersion = "v1",
                            CurrentContext = c.Item3
                        };

                        var serializer = new Serializer();

                        return serializer.Serialize(k8SConfiguration);
                    });

                    annoyingresourcehack=false;
                }
                catch(NullReferenceException e){
                    if(retrycounter==retrymax){
                        Console.WriteLine("retries exhausted while attempting to fetch UserKubeConfig in CreateKubeConfig(str)");
                        throw e;                        
                    }

                    Console.WriteLine("unable to resolve resource for UserKubeConfig in CreateKubeConfig(str), retrying..");
                    Thread.Sleep(5000);
                    retrycounter++;
                }
            }
its ugly, and i felt dirty after i wrote this "fix" - but evidently our pipelines can now run without failing 😞
e
That's really odd because that try/catch won't catch anything inside an apply which suggests it's
Output.Tuple(certauth, endpoint, arn)
which is throwing? But also that is a hot loop, Apply function don't run synchronously so your probably creating hundreds or thousands of apply objects doing this!