Hmm, still fighting my Helm template issue here :t...
# getting-started
e
Hmm, still fighting my Helm template issue here šŸ§µ. Working for helm but not
.release
api
confirmed with basic
values.yml
that will build in about 3 mins running with
helm install/upgrade
but running as part of Pulumi it freezes for about 5 minutes then timesout
I can see resources being created by pulumi but they get rolled back by the
--atomic
flag at the end of it
There are subcharts within here that have been preinstalled with
helm dependencies update
pulumi logs
and
pulumi up v=9
doesn't yield anything usefull
could it be that I just need to add more time to the deployment timeout?
---
trying a run now with
skipAwait
and
timeout 6000
Copy code
const temporal = new k8s.helm.v3.Release("temp", {
    chart: "./temporal",
    version: "1.0.0",
    atomic: true,
    skipAwait: true,
    timeout: 6000,
    values: {
      server: {
        replicaCount:1
      },
      cassandra: {
        config: {
          cluster_size: 1,
        }
      },
      prometheus: {
        enabled: true
      },
      grafana: { enabled: true},
      elasticearch: { enabled: false},
    }
});
g
Hi Zach! Good to see you on here! Regarding the log situation, I'm in the middle of fixing those docs. Short version is the verbose flag only gives you logs from the engine, not the provider. You'll need to add different flags to get output to bubble up from those. Overall, the exact flags that you want depends on whether any providers you're using are using the Terraform bridge. But here's a short snippet that should get you all of the diagnostic data that's possible if you can run it locally:
Copy code
TF_LOG=TRACE pulumi up -v=11 --logflow --logtostderr 2>&1 | tee -a pulumi_log.txt
The environment variable asks the classic provider to bubble up data. The
--logflow
flag bubbles up data from providers that aren't on the Terraform bridge.
It looks like you solved the main problem you were running into, though?
e
Thanks for logging, I'll give that shot and see if I can get more info. I'm able to launch operations via the helm.release api however it seems to be getting stuck along the way. What takes about 3mins on helm sits running on pulumi deploy until a sigint
or timeout
let me see if I get anything out of the increased verbosity here
My current theory that it has to do with one of the helm hooks/dependencies/subcharts being used by the temporal chart
g
ahh, got it. If you want to dump your output for me (and redact/send me a private gist if there's confidential stuff in there), I'll go digging if you don't find a good bit of info
e
thx
building the stack now...
hmm interesting, running with the increased verbosity looks to panic on some iam Userpolicies being generated:
I'll run this normally, then turn on the logging just for the helm bits
hmm, I had some older versions of the providers loaded ,but upgrading them here didn't seem to help.
b
Set a reminder to come back to this tomorrow
šŸ™ 1
g
@echoing-smartphone-60420, Lee's been here longer and knows way more than me about our Helm stuff as I'm still learning it. We'll figure this out :)
e
Thanks. We're including this a next gen tech stack proposal for a client shipping here in the next month or so. So we've got a little time. Lmk if you want to jump on a session to kick tires.
b
I'm in transit tomorrow, should definitely be able to help on Monday
@echoing-smartphone-60420 following up here: when you're running the helm relese from pulumi and it gets stuck, what's actually happening in the cluster?
e
im seeing some resources provision from kubectl but not seeing a release installed on the helm cli
let me see where its getting stuck
b
I can try repro when I'm not flying, the local helm chart you're using is the same as the upstream one?
e
alright I got the infra up
I'll run the release resource here with the log leve
Copy code
Do you want to perform this update? details
  pulumi:pulumi:Stack: (same)
    [urn=urn:pulumi:main::grand-infra::pulumi:pulumi:Stack::grand-infra-main]
    + kubernetes:<http://helm.sh/v3:Release|helm.sh/v3:Release>: (create)
        [urn=urn:pulumi:main::grand-infra::kubernetes:<http://helm.sh/v3:Release::temp|helm.sh/v3:Release::temp>]
        [provider=urn:pulumi:main::grand-infra::pulumi:providers:kubernetes::default_3_18_2::04da6b54-80e4-46f7-96ec-b56ff0331ba9]
        atomic                  : true
        chart                   : "./temporal"
        cleanupOnFail           : false
        createNamespace         : false
        dependencyUpdate        : false
        devel                   : false
        disableCRDHooks         : false
        disableOpenapiValidation: false
        disableWebhooks         : false
        forceUpdate             : false
        lint                    : false
        name                    : "temp-b1f36b40"
        namespace               : "default"
        recreatePods            : false
        renderSubchartNotes     : false
        replace                 : false
        resetValues             : false
        resourceNames           : {
            <http://ClusterRole.rbac.authorization.k8s.io/rbac.authorization.k8s.io/v1|ClusterRole.rbac.authorization.k8s.io/rbac.authorization.k8s.io/v1>       : [
                [0]: "temp-b1f36b40-kube-state-metrics"
                [1]: "temp-b1f36b40-prometheus-alertmanager"
                [2]: "temp-b1f36b40-prometheus-pushgateway"
                [3]: "temp-b1f36b40-prometheus-server"
            ]
            <http://ClusterRoleBinding.rbac.authorization.k8s.io/rbac.authorization.k8s.io/v1|ClusterRoleBinding.rbac.authorization.k8s.io/rbac.authorization.k8s.io/v1>: [
                [0]: "temp-b1f36b40-kube-state-metrics"
                [1]: "temp-b1f36b40-prometheus-alertmanager"
                [2]: "temp-b1f36b40-prometheus-pushgateway"
                [3]: "temp-b1f36b40-prometheus-server"
            ]
            ConfigMap/v1                                                             : [
                [0]: "default/temp-b1f36b40-grafana"
                [1]: "default/temp-b1f36b40-grafana-dashboards-default"
                [2]: "default/temp-b1f36b40-prometheus-alertmanager"
                [3]: "default/temp-b1f36b40-prometheus-server"
                [4]: "temp-b1f36b40-temporal-dynamic-config"
                [5]: "temp-b1f36b40-temporal-frontend-config"
                [6]: "temp-b1f36b40-temporal-history-config"
                [7]: "temp-b1f36b40-temporal-matching-config"
                [8]: "temp-b1f36b40-temporal-web-config"
                [9]: "temp-b1f36b40-temporal-worker-config"
            ]
            Deployment.apps/apps/v1                                                  : [
                [0]: "default/temp-b1f36b40-grafana"
                [1]: "default/temp-b1f36b40-kube-state-metrics"
                [2]: "default/temp-b1f36b40-prometheus-alertmanager"
                [3]: "default/temp-b1f36b40-prometheus-pushgateway"
                [4]: "default/temp-b1f36b40-prometheus-server"
                [5]: "temp-b1f36b40-temporal-admintools"
                [6]: "temp-b1f36b40-temporal-frontend"
                [7]: "temp-b1f36b40-temporal-history"
                [8]: "temp-b1f36b40-temporal-matching"
                [9]: "temp-b1f36b40-temporal-web"
                [10]: "temp-b1f36b40-temporal-worker"
            ]
            Job.batch/batch/v1                                                       : [
                [0]: "temp-b1f36b40-temporal-es-index-setup"
                [1]: "temp-b1f36b40-temporal-schema-setup"
                [2]: "temp-b1f36b40-temporal-schema-update"
            ]
            PersistentVolumeClaim/v1                                                 : [
                [0]: "default/temp-b1f36b40-prometheus-alertmanager"
                [1]: "default/temp-b1f36b40-prometheus-server"
            ]
            Pod/v1                                                                   : [
                [0]: "temp-b1f36b40-wismc-test"
            ]
            PodDisruptionBudget.policy/policy/v1beta1                                : [
                [0]: "elasticsearch-master-pdb"
            ]
            Secret/v1                                                                : [
                [0]: "default/temp-b1f36b40-grafana"
                [1]: "temp-b1f36b40-temporal-default-store"
                [2]: "temp-b1f36b40-temporal-visibility-store"
            ]
            Service/v1                                                               : [
                [0]: "default/temp-b1f36b40-grafana"
                [1]: "default/temp-b1f36b40-kube-state-metrics"
                [2]: "default/temp-b1f36b40-prometheus-alertmanager"
                [3]: "default/temp-b1f36b40-prometheus-pushgateway"
                [4]: "default/temp-b1f36b40-prometheus-server"
                [5]: "elasticsearch-master"
                [6]: "elasticsearch-master-headless"
                [7]: "temp-b1f36b40-cassandra"
                [8]: "temp-b1f36b40-temporal-admintools"
                [9]: "temp-b1f36b40-temporal-frontend"
                [10]: "temp-b1f36b40-temporal-frontend-headless"
                [11]: "temp-b1f36b40-temporal-history-headless"
                [12]: "temp-b1f36b40-temporal-matching-headless"
                [13]: "temp-b1f36b40-temporal-web"
                [14]: "temp-b1f36b40-temporal-worker-headless"
            ]
            ServiceAccount/v1                                                        : [
                [0]: "default/temp-b1f36b40-grafana"
                [1]: "default/temp-b1f36b40-kube-state-metrics"
                [2]: "default/temp-b1f36b40-prometheus-alertmanager"
                [3]: "default/temp-b1f36b40-prometheus-pushgateway"
                [4]: "default/temp-b1f36b40-prometheus-server"
                [5]: "temporaladmin"
            ]
            StatefulSet.apps/apps/v1                                                 : [
                [0]: "elasticsearch-master"
                [1]: "temp-b1f36b40-cassandra"
            ]
        }
        reuseValues             : false
        skipAwait               : true
        skipCrds                : false
        timeout                 : 6000
        values                  : {
            cassandra   : {
                config: {
                    cluster_size: 1
                }
            }
            elasticearch: {
                enabled: false
            }
            grafana     : {
                enabled: true
            }
            prometheus  : {
                enabled: true
            }
            server      : {
                replicaCount: 1
            }
        }
        verify                  : false
        version                 : "1.0.0"
        waitForJobs             : false
        ~ pulumi:providers:kubernetes: (update)
            [id=b62bfb25-6546-424a-92b1-384e58dfcc80]
            [urn=urn:pulumi:main::grand-infra::eks:index:Cluster$pulumi:providers:kubernetes::grand-eks-eks-k8s]
          ~ version: "3.17.0" => "3.18.2"
        ~ pulumi:providers:kubernetes: (update)
            [id=b69eea56-b079-4318-8853-c6f2cbd7e146]
            [urn=urn:pulumi:main::grand-infra::eks:index:Cluster$pulumi:providers:kubernetes::grand-eks-provider]
          ~ version: "3.17.0" => "3.18.2"
ā˜ļø there's the
.Values
its a super vanila deploy
trying now with 3.18.2 to as well
b
alright I got the infra up
@echoing-smartphone-60420 does this mean you got it running? anything else you need help debugging with here?
e
No I was meaning the cluster itself.
After upgrading to 3.18.2 I'm still running into an error when trying to run the .release
b
okay, trying to deploy it myself now
@echoing-smartphone-60420 one thing I noticed: ā€¢ you aren't passing a provider to the helm template at all, so where is the helm chart going?
also, did you pull the sub charts locally? I'm having a hard time getting those running
e
ah, that's a good point
let me take a look here
I also did pull the subcharts in
b
yeah I just ran
helm dependency update
and it pulled them
e
interesting, I didn't even see the k8s.provider, but I assume its using the default or active context from kubeconfig
b
yeah, it uses the "ambient" provider
e
Though not sure how it knows what namespace to use
guess default
b
you can disable it on the stack to prevent it doing weird things
yeah it'll use default
e
So to answer your question:
Copy code
@Zach Gates one thing I noticed:
you aren't passing a provider to the helm template at all, so where is the helm chart going?
I'm running
aws eks update-kubeconfig --name <name of cluster>
after completing the cluster creation earlier in script
b
got it, so it's going to the right place
e
probably not relevant but i'm using
aws-vault
to wrap the pulumi call with aws-environment variables
b
@echoing-smartphone-60420 this provisioned correctly for me: https://gist.github.com/jaxxstorm/8880143ed115e95b751694397b1cdbbe Note, I added a
dependsOn
for the helm chart to the node group, what I found was that the pods were stuck pending
Copy code
NAME                                                     READY   STATUS    RESTARTS   AGE
elasticsearch-master-0                                   0/1     Pending   0          38m
elasticsearch-master-1                                   0/1     Pending   0          38m
elasticsearch-master-2                                   0/1     Pending   0          38m
temp-dd4cb578-cassandra-0                                0/1     Pending   0          38m
temp-dd4cb578-grafana-dd58f468d-d2pg5                    0/1     Pending   0          38m
temp-dd4cb578-kube-state-metrics-548559c9cd-c6m9s        0/1     Pending   0          38m
temp-dd4cb578-prometheus-alertmanager-6b4b6cb5b7-m5vv8   0/2     Pending   0          38m
temp-dd4cb578-prometheus-pushgateway-b7d4f96d6-tvm7z     0/1     Pending   0          38m
temp-dd4cb578-prometheus-server-5d9674bd4c-qs4vg         0/2     Pending   0          38m
temp-dd4cb578-temporal-admintools-6b8bf64cb5-7zhbs       0/1     Pending   0          38m
temp-dd4cb578-temporal-frontend-5db9b4d79-6ld72          0/1     Pending   0          38m
temp-dd4cb578-temporal-history-666dc56744-jgqnn          0/1     Pending   0          38m
temp-dd4cb578-temporal-matching-5d468745c6-s8t9l         0/1     Pending   0          38m
temp-dd4cb578-temporal-web-7b77d477c8-gllhf              0/1     Pending   0          38m
temp-dd4cb578-temporal-worker-758b668454-tl6pp           0/1     Pending   0          38m
the nodegroups takes a little while to provision because of the fargate/spot bid, which then the helm release times out
adding a dependsOn waits for the workload to provision, and then starts the helm release
šŸŽ‰ 1
I ran it twice and it worked both times, but it takes between 20m/30m
my last attempt failed because the spot bid wasn't successful šŸ˜„
e
ah yes
sweet, lemme give it a shot here
appreciate your time/effort
239 Views