Optimizing your build
Ordering steps
The initial cookbooks in this series seemed to automatically cache build steps and perform predictably - make a change to the Dockerfile and the build will pick up from the point of the change.
In such cases, when there are steps that can be executed in any order it is a good idea to order the steps that take longest and/or are least likely to change earlier than quick steps that may change more frequently.
This works well until you get to the point where steps need to be ordered.
Now that you are copying sources from outside of the Dockerfile,
a change to any source will trigger a rebuild from the point
of the COPY statement on, including time consuming steps like
bundle install
and yarn install
.
The relevant portion of the Dockerfile looks something like the following:
COPY . .
RUN bundle install
Splitting that into two copy statements can make a dramatic improvement in build times:
COPY Gemfile* .
RUN bundle install
COPY . .
As the first statement will only copy Gemfile
and Gemfile.lock
,
the time consuming bundle install
step will only be run if these
two specific files have changed.
This can result in a dramatic reduction in build times for cases where the Gemfile did not change.
Multi-stage builds
Now consider bundle install
and yarn install
. Both can be
time consuming. They can be run in either order. If run sequentially,
you will be faced with a choice: should a change to the Gemfile
result in an unnecessary reinstall of node modules, or should a
change to package.json
result in an unnecessary reinstall of gems?
We’ve seen how multi-stage builds can reduce image size. They also can be used to reduce build times. An example to illustrate:
FROM ruby:slim as base
RUN apt-get install -y build-essential &&
volta install node@lts yarn@latest
WORKDIR /demo
FROM base as gems
COPY Gemfile* .
RUN bundle install
FROM base as node
COPY package*.json .
RUN yarn install
FROM base
RUN apt-get install -y postgresql-client
COPY . .
COPY --from=gems /usr/local/bundle /usr/local/bundle
COPY --from=node /demo/node-modules /demo/node-modules
Such a Dockerfile will only run bundle install
if the Gemfile
has changed, and only run yarn install
if package.json
changed.
Even better, if both changed, they will be run concurrently.
In fact, on the first run they will be run concurrently with
the installation of postgresql-client
.
Caching
We’ve seen how splitting COPY statements can reduce the number of steps, but it still remains the case that any change to the Gemfile will result in reinstalling all gems.
This can be improved by using the dedicated RUN cache.
Applied to bundle installs, the resulting build instructions would look something like the following:
RUN --mount=type=cache,id=dev-gem-cache,sharing=locked,target=/srv/vendor \
bundle config set app_config .bundle && \
bundle config set without 'development test' && \
bundle config set path /srv/vendor && \
bundle install && \
bundle clean && \
mkdir -p vendor && \
bundle config set path vendor && \
cp -ar /srv/vendor .
That’s a lot to unpack. Statement by statement:
- a
gem-cache
directory is mounted on/srv/vendor
- the bundle config directory is set to be a
.bundle
subdirectory of the current application. - gems marked as development or test in the Gemfile are not to be installed.
- the bundle directory is set to the
/srv/vendor
directory - the install is performed
- unused gems are removed
- a vendor subdirectory is created.
- the bundle directory is changed to be
vendor
subdirectory - the contents of the
/srv/vendor
cache is copied to the vendor subdirectory.
The final build stage can copy the entire app directory from
the build stage that included the above RUN
statement to
pick up the configuration as well as the gems.
With this in place, adding a single gem to your Gemfile will result in only the installation of that one gem.
recap
Starting from a simple sequence of two statements (potentially three
if yarn install
is added) we have explored a number of techniques
involving ordering, splitting, staging, and caching of statements
and results.
We started with something that was simple and slow and ended with a solution that is considerably less simple but decidedly faster.
This results in a trade-off. For small projects it may make sense to only adopt some of these techniques to keep the Dockerfile maintainable. For other projects it may make sense to incorporate more.