Releasing Elixir/OTP applications to the World
Table of Contents
Developing Elixir/OTP applications is an enlightening, mind-boggling, and ultimately enjoyable experience. There are so many features of the language that change the very way we as developers think about concurrency and program structure. From writing pure functional code, to using message passing to coordinate complex systems, it is one of the best languages for the SMP revolution that has been slowly boiling under our feet.
However, releasing Elixir and OTP applications is an entirely different and seemingly seldom discussed topic.
The distribution tool chain of Erlang and OTP is a complicated one, There's
systools
, reltool
,
rebar(?3)
, and relx
just to name a few that all
ultimately help in creating an Erlang/OTP "release".
Similar to rebar3
, exrm
takes the high-level abstraction approach
to combining reltool
and relx
into a single tool chain for creating
releases of Elixir projects. Of course, we can also borrow from the collection
of autotools.
There are plenty of articles and posts discussing how and why to use exrm
. I
feel many of them, however, fail to truly discuss how to do this
effectively. Most will mention the surface of the issue, but never give the
issue any real attention. As any developer that wants to eventually ship
code, this is entirely too frustrating to leave alone.
There are "ways" of deploying OTP code relatively simply, however, these
methods generally avoid good practice of continuous integration/continuous
deployment, e.g., "build the OTP application on the target system" or simply
use mix run
, etc.
I cannot speak for everyone, but my general goal is to not have such a manual step in my release pipeline, let alone having a possibly full autotool chain and Erlang/Elixir stack on the production system is slightly unnerving for it's own set of reasons.
Problem
Here are some selected quotes; I'm not trying to pick on anyone in particular or the community at large, but I'm trying to show a representation of why this very topic is an issue in the first place.
We need to be sure that the architectures for both our build and hosting environments are the same, e.g. 64-bit Linux -> 64-bit Linux. If the architectures don't match, our application might not run when deployed. Using a virtual machine that mirrors our hosting environment as our build environment is an easy way to avoid that problem. Phoenix Exrm Releases.
And another, similar quote:
One important thing to note, however: you must use the same architecture for building your release that the release is getting deployed to. If your development machine is OS X and you're deploying to a Linux server, you need a Linux machine to build your exrm release or it isn't going to work, or you can just build on the same server you're going to be running everything on. Brandon Richey.
Unfortunately, these miss a lot of the more subtle issues, dependency hell is real, and we're about to really dive into it.
There are a few examples where "same architecture" isn't enough, and this is where we will spend the majority of our time.
For these examples, we will assume our host machine is running GNU/Linux,
specifically Arch Linux, and our target machine is running CentOS 7.2. Both
machines are running the AMD64
instruction sets, the architectures are the
same.
Shared Objects
Let's start with the most simplistic issue, different versions of shared objects.
Arch Linux is a rolling release distribution that is generally right on the
bleeding edge of packages, upstream is usually the development sources
themselves. When ncurses
moves version 6, Arch isn't far behind in putting
it in the stable package repository (and rebuilding a number of packages that
depend on ncurses
). CentOS, on the other hand, is not so aggressive.
Therefore, when using the default relx
configuration with exrm
, the Erlang
runtime system (ERTS) bundled with the release will be incompatible with the
target system.
When the OTP application is started, an obscure linking error will be emitted
complaining about how ERTS cannot find a ncurses.so.6
file and promptly fail.
Worse, after possibly "fixing" this issue, ncurses
is only one of a few
shared objects Erlang needs to run, depending on what was enabled when Erlang
was built or what features the deployed application needs.
Erlang Libraries
We may try to resolve this issue by adding a particular rel/relx.config
file
to our Elixir project. Specifically, we will not bundle ERTS, opting to use
the target's ERTS instead.
{include_erts, false}.
This seems like a promising approach, until another error message is emitted at
startup, namely, ERTS cannot find stdlib-2.8
in the /usr/lib/erlang/lib
folder.
Did I mention that our current build system is Arch and our target is CentOS? Arch may have the newest version of Erlang in the repository and CentOS is still at whatever it was at before: R16B unless the Erlang Solutions release is being used.
Since Erlang applications do (patch number) version locking, applications in
the dependency tree will need to match exactly and it's guaranteed that any and
all OTP applications will be at least depending on the Erlang kernel and the
Erlang standard library, these are at least two OTP applications our
application is going to need that are no longer packaged when relx
doesn't
bundle ERTS.
Even if we specify another option to relx
, namely, {system_libs, true}.
, we
are left with the same lack of Erlang system libraries.
That's correct and there is some sensible reasons for this. If we ask exrm
and therefore relx
to not include the build system's ERTS, we are also
excluding the standard Erlang libraries from the release as well, asking to
include the standard libraries of the build system's ERTS could run into the
very same issues as above for a whole host of other reasons.
We are left to attempt more solutions.
Docker or Virtualization
Next, since we do want to ultimately get our build running in a CI/CD
environment, we may look toward virutalization/containerization. Being
sensible people, we try to use a small image, maybe basing our image on
Alpine Linux as to be nice to our precious /var
or SSD
space. We may even go so far as to build Erlang and Elixir ourselves in these
images to make sure we have the most control over them as we can. Furthermore,
since we are building everything ourself, shipping the built ERTS seems like a
good idea too, so we can delete the rel/relx.config
file.
This seems promising. However, we have shared object problems again. Since we
are building Erlang and Elixir ourselves, we decided to disable termcap
support thus no longer requiring the ncurses
library altogether. We hope
that the openssl
libraries are the same, so we don't have to worry about that
mess, and we move on.
This time, when we attempt to deploy the application, we get a different,
obscure error: something about our musl
C library isn't found on the target
system. Right, because we are trying to create a small image, we opted to use
the musl
C library because of its size and being easily supported in the
Alpine Linux container. Trying to use GNU C library is too cumbersome and
would only inflate the image beyond any gains we would achieve by using Alpine
in the first place.
That's not going to work.
OTP as Project Dependency
Another option we might try is make Erlang a build dependency of our Elixir application, this could be achieved via the following structure:
{:otp, "~> 18.3.2", github: "erlang/otp", tag: "OTP-18.3.2", only: :prod, compile: "./otp_build autoconf;" <> "./configure --without-termcap --without-javac;" <> "make -j4" <> "DISTDIR=/tmp/erlang make install" }
Then using rel/relx.config
with:
{include_erts, "/tmp/erlang"}.
May turn out to work, assuming the build server and the target system have the same shared objects for OpenSSL and others that may be enabled by default.
However, I didn't follow this idea all the way to the end as I wasn't entirely happy with it, and it would fall to some later issues.
Notably, though, this will inflate the production builds drastically since our
mix deps.get
and mix deps.compile
steps will hang attempting to build
Erlang itself.
However, again, we will likely run into issues with the C library used by the build system/container. Going this route doesn't allow us to use Alpine Linux either.
Worse, there's another issue that hasn't even shown itself but is lying in wait: native implemented (or interface) functions (NIFs).
If our project has a dependency that builds a NIF as part of its build
(Elixir's comeonin is a good example of this), unless the NIF
is statically compiled, we are back to square one and shared objects are not
our friends. Furthermore, if we are using a different standard library
implementation, i.e., musl
vs glibc
, the dependency will likely complain
about it as well.
Non-Solution Solutions
Of course, all of these above issues can be solved by "just building on the
target machine" or by simply using mix run
on the target instead. However, I
personally find these solutions unacceptable.
I'm not overly fond of requiring my target hosts, my production machines, running a full development tool chain. Before this is dismissed as a personal issue, remember that our dependency tree may contain NIFs outside of our control. Therefore, it's not just Erlang/Elixir that are required to be on the machine, but a C standard library and autotools too.
This solution doesn't immediately give the impression of scaling architecture. If a new release needs to be deployed, each server will now need to spare some load for building the project and its dependencies before any real, actual upgrading can continue.
Solutions(?)
What are we to do? How are we to build Erlang/Elixir/OTP applications as part of our CI/CD pipeline? Particularly, how are we to build our applications on a CI/CD system and not the production box(es) themselves?
If any of the above problems tell us anything, it's that the build system must be either the exact same machine or clone with build tools. Thankfully, we can achieve a "clone" without too much work using Docker and the official image registries.
By using the official CentOS image and a specific tag, we can match our target system almost exactly. Furthermore, building the Erlang/Elixir stack from source is a relatively small order for a Docker container too, making versioning completely within reach. Moreover, since the build host and the target host are nearly identical, bundling ERTS should be a non-issue.
This is the observed result of using docker-elixir-centos for a base image for CI builds.
Another possible solution is to ship Docker containers as the artifact of the
build. However, this, to do well, requires a decent Docker capable
infrastructure and deployment process. Furthermore, going this route, it's
unlikely that exrm
is even necessary at all. It is likely more appropriate
to simply use mix run
or whatever the project's equivalent is. Another thing
lost here, is relups, which is essentially the whole
reason of wanting to use exrm
in the first place.
As such, if using exrm
is desired, setting up a build server will be
imperative to building reliably and without building on production. Scaling
from a solid build foundation will be much easier than building and "deploying"
on the production farm itself.
Moving Forward
Releasing software isn't in a particularly hard class of problems, but it does
have its challenges. Some languages attempt to solve this challenge in its
artifact/build result. Other languages, unfortunately, don't attempt to solve
this problem at all. Though, I can see it possible to eventually reach a goal
of being able to create binary releases with steps as simple as ./configure &&
make && make install && tar
.
But we aren't there yet.
But we are close.
The current way Erlang/OTP applications want to be deployed includes wanting to ship with the runtime, this is a great starting point.
To move to a better, easier release cycle, we need a few things:
- The ability to (natively) cross-compile to different architectures and different versions of ERTS and cross-compile Erlang code itself.
- The ability to easily statically compile ERTS and bundle the result for the specified architecture.
Cross-compiling to different versions of ERTS is likely a harder problem to tackle. But being able to cross-compile the ERTS itself is likely much easier since this is already a feature of GCC.
Thus, our problem is now how do we add and/or expose the facility of customizing the appropriate build flags to our projects and dependencies to cross-compile a static ERTS and any NIFs and bundle these into a solid OTP release.