Any strategies for running/cleaning up k8s job as ...
# general
i
Any strategies for running/cleaning up k8s job as a
dependsOn
for the remainder of a deployment? I’m expecting one job to be run per deployment, and I know I’ve run into unique job name problems, as well as cleanup problems in past attempts. Another concern is that the job (db migration) may end up taking some time in the future - thinking about bigger column or table migrations and timeouts. Any thoughts/strategies that are working well?
g
We’ve discussed the Job resource a bit before, but haven’t settled on semantics that would be useful for Pulumi users. The main problem is that Jobs can encompass all sorts of behavior, and we don’t have any way of knowing what is expected behavior. Currently, we don’t have any await logic for Jobs; Pulumi creates the resource and doesn’t take further action. That being said, we’re very interested to hear your use cases and figuring out how we can support those workflows in Pulumi. See also: https://github.com/pulumi/pulumi-kubernetes/issues/449
/cc @creamy-potato-29402 @breezy-hamburger-69619
i
This is currently just a rails api db migration. As-is, these will mostly run quickly. We have multiple replicas so we don’t want to just add this to the start script like the pulumi rails example. Running as a job we can guarantee parallelism/completion
If I can’t await a migration, that’s a big production problem.
…well
We do check that the db is migrated on our readiness check, so I suppose that is one way we are making sure that the old deployment is not taken down (I think). I’m sure about the readiness part.
g
Yeah, DB migrations are the most common use case we’ve seen for Jobs. Pulumi will properly wait on readiness checks in the Pods, so that might be good enough for now.
i
I’m guessing I’ll still have to add random job names and deal with cleanup separately as well still.
b
Your use-case is a common one. The reality is that a k8s Job can mean a lot of different things to different people so capturing, let alone generalizing a Job’s “completeness” is subjective. Specifically, Jobs only run once until completeness and never again. For you to force a Job to always be created you must work around it by forcing spec changes e.g. inserting a random string or changing commit hash into labels etc. And as you’ve pointed out, Job Pods don’t get cleaned up by design [1]. The ttl feature you pointed out is new and around as of 1.12 in a alpha state. 1 - https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/#job-termination-and-cleanup
In your case, the deployment having the appropriate readiness and liveness checks is where you should bake in any requirements for the deployment e.g. ensuring the migration occurred successfully
To dictate whether a forced created Job at update time actually runs, we’ve gated this for a client with an
initContainer
that examines the pod’s annotation to determine if the Job should actually run after its been created - this gives us a bit of a lever to 1) guarantee the job gets created in k8s at pulumi update time and 2) we have a means and a lever to pull via annotation to determine if the Job is allowed or blocked to run
👍 1
Something similar could apply in your case