This message was deleted Pulumi Community #general

Join Slack

This message was deleted.

# general

sparse-intern-71089

10/29/2018, 11:08 PM

This message was deleted.

lemon-spoon-91807

10/29/2018, 11:10 PM

Hey!

lemon-spoon-91807

10/29/2018, 11:10 PM

I'm actually looking at this right now

lemon-spoon-91807

10/29/2018, 11:10 PM

but i'm trying to figure out the right thing to do here.

lemon-spoon-91807

10/29/2018, 11:10 PM

i could definitely use some info from you

lemon-spoon-91807

10/29/2018, 11:10 PM

the part i'm trying to understand is this:

lemon-spoon-91807

10/29/2018, 11:10 PM

if your layer changes... why would you not want the ID to change?

glamorous-printer-66548

10/29/2018, 11:11 PM

well I’m ok if the ID changes

glamorous-printer-66548

10/29/2018, 11:11 PM

but at the same I need some stable tag to use as --cache-from source

glamorous-printer-66548

10/29/2018, 11:12 PM

Before using pulumi we did the following in our docker build bash scripts:

glamorous-printer-66548

10/29/2018, 11:14 PM

Whenever we build an image, we tagged it with 2 tags: -

latest

<git_sha>

and then we pushed both tags. In our k8s deployment we would reference the docker image with the

<git_sha>

tag but during docker build we would pull and --cache-from=<image_name>:latest

glamorous-printer-66548

10/29/2018, 11:15 PM

So I think if it’s possible to push two tags, where one of them is the image-id and the other one is some table identifier like

latest

that somewhat solves my problem.

lemon-spoon-91807

10/29/2018, 11:15 PM

ok. let me takea look

lemon-spoon-91807

10/29/2018, 11:16 PM

i admit the cache-from code confuses me greatly

lemon-spoon-91807

10/29/2018, 11:16 PM

but i thought that's what it wasy trying to do

lemon-spoon-91807

10/29/2018, 11:16 PM

specifically, if you did something like:

lemon-spoon-91807

10/29/2018, 11:16 PM

cacheFrom: {stages: ["some_id"]}

glamorous-printer-66548

10/29/2018, 11:18 PM

nah unfortunately not

lemon-spoon-91807

10/29/2018, 11:18 PM

could you clarify? 🙂

glamorous-printer-66548

10/29/2018, 11:21 PM

stages

are named stages in a multi-stage Dockerfile. A multi-stage dockerfile contains multiple

FROM

clauses.. i.e. check thiose docs: https://docs.docker.com/develop/develop-images/multistage-build/#use-multi-stage-builds . If I would specify

latest

as stage name, pulumi’s docker code would attempt to build the dockerfile with

docker build . --target latest

which would fail unless the dockerfile contains a named

latest

stage. Or to cut a long story short: stages have nothing to do with tags.

glamorous-printer-66548

10/29/2018, 11:22 PM

I want basically multiple tags, but not multiple stages 🙂

lemon-spoon-91807

10/29/2018, 11:22 PM

oh. are you making a feature request effectively?

lemon-spoon-91807

10/29/2018, 11:23 PM

(sorry, trying to distinguish that from this being a report about a bug i may have introduced :))

glamorous-printer-66548

10/29/2018, 11:24 PM

aehm

glamorous-printer-66548

10/29/2018, 11:25 PM

Well, before your change I was table to tag the images pulumi built always with the stable tag

latest

glamorous-printer-66548

10/29/2018, 11:26 PM

After your change it will always tag things with something like

<image_id>

latest-<image_id>

. So basically the tag gets unstable which makes it unsuitable for caching.

lemon-spoon-91807

10/29/2018, 11:26 PM

i see

glamorous-printer-66548

10/29/2018, 11:26 PM

And everytime I make a code change now it will build the entire image completely from scratch.

lemon-spoon-91807

10/29/2018, 11:26 PM

i think

lemon-spoon-91807

10/29/2018, 11:26 PM

i need to talk to someone

glamorous-printer-66548

10/29/2018, 11:26 PM

which takes a LOOOT of time 😛

lemon-spoon-91807

10/29/2018, 11:27 PM

And everytime I make a code change now it will build the entire image completely from scratch.

glamorous-printer-66548

10/29/2018, 11:27 PM

yeah docker is confusing

glamorous-printer-66548

10/29/2018, 11:36 PM

FYI always tagging every build and push with

latest

and using the

latest

tag as --cache-from is a very simple strategy that speeds up builds in many cases, but a slightly more sophisticated and better strategy would be probably this: 1. tag and push each image with the git_sha of HEAD 2. before build pulumi should attempt to pull

<image_name>:<git_sha_HEAD>

. If it could successfully pull this, use it --cache-from . If not, attempt to pull

<image_name>:<git_sha_HEAD~1>

and use this as --cache from. Repeat this process until an image could be pulled successfully (maybe stop doing so after 5 iterations or until HEAD~5).

glamorous-printer-66548

10/29/2018, 11:38 PM

This strategy should make it possible to reuse a lot of cached layers for most builds.

lemon-spoon-91807

10/29/2018, 11:39 PM

could you potentially open an issue with that suggestion?

glamorous-printer-66548

10/29/2018, 11:47 PM

ok sure

glamorous-printer-66548

10/29/2018, 11:48 PM

fyi in https://github.com/pulumi/pulumi-docker/issues/32 @white-balloon-205 discusses on an abstract level the same thing 🙂 . Intelligently tagging and using --cache-from to speed up builds.

glamorous-printer-66548

10/30/2018, 12:12 AM

I opened an issue now https://github.com/pulumi/pulumi-docker/issues/33

lemon-spoon-91807

10/30/2018, 12:13 AM

question that isn't quite clear to me

lemon-spoon-91807

10/30/2018, 12:13 AM

wouldn't this part be incorrect:

If not, attempt to pull <image_name>:<git_sha_HEAD~1>

lemon-spoon-91807

10/30/2018, 12:13 AM

wouldn't that discount any changes you made yourself?

lemon-spoon-91807

10/30/2018, 12:14 AM

This needs probably some additional thought for determining what tag to push if the git working directory was dirty at the time of build.

lemon-spoon-91807

10/30/2018, 12:14 AM

one thing we're considerign is not to use a git hash, but our own hash of file system contents. we already have that concept for other parts of our system (for example, it's how we know what to update when node_modules changes)

lemon-spoon-91807

10/30/2018, 12:15 AM

however, for docker, we feel like it would need to be opt-in. because, after all, any docker build could produce a different output, even if hte contents on disk stayed teh same.

glamorous-printer-66548

10/30/2018, 12:23 AM

wouldn’t that discount any changes you made yourself?

Nope, if I use

<git_sha_HEAD~1

as cache-source docker will attempt to reuse as many layers as possible from the cached image, but it will still detect local source code changes or local dockerfile changes (that are different from the cached image) and build the corresponding layers from scratch.

lemon-spoon-91807

10/30/2018, 12:24 AM

ok. so just so i understand as well, what is the reason for not using cache from stages like: cacheFrom: {stages: ["some_id"]}

lemon-spoon-91807

10/30/2018, 12:24 AM

glamorous-printer-66548

10/30/2018, 12:32 AM

Well first of all in order to use

cacheFrom: {stages: ...}

I would have to add multiple stages to each dockerfile which is some additional work (and shouldn’t be required to get good caching). Second I think after your change even when using

cacheFrom: {stages:... }

each stage will be tagged with something like

<stageName>-<image_id>

. So each of the stages doesn’t have a predictable tag to pull from either.

glamorous-printer-66548

10/30/2018, 12:33 AM

In general stages are a feature that imho shouldn’t be bothered with to get good remote caching. It’s imho an unrelated docker feature and that was created for a different purpose and not to improve caching.

lemon-spoon-91807

10/30/2018, 12:33 AM

ok. i think i have a lot to learn about this

lemon-spoon-91807

10/30/2018, 12:33 AM

i'm really hesitant about pulumi having any knowledge of things like git hashes

glamorous-printer-66548

10/30/2018, 12:34 AM

If anything multi-stage docker builds are even harder to optimize for good remote caching.

glamorous-printer-66548

10/30/2018, 12:35 AM

The problem is if you don’t use git hashes, but instead your own file hashing, how will you determine the hash of an ancestor commit or anchestor build?

lemon-spoon-91807

10/30/2018, 12:38 AM

I honestly don't know 🙂 i don't have any good answers as i don't really understand this space well enough.

glamorous-printer-66548

10/30/2018, 12:42 AM

Ok, now that I think about it. If you store a pulumi specific hash in the checkpoint you could read it from there. The disadvantage of this would be that you probably can’t read the hash across stacks. This means that each stack’s docker images would be cached separately which is suboptimal when having a large number of stacks of the same project or when attempting to quickly create ephemeral stacks like we do in some cases.

3 Views

Open in Slack

Previous Next