Elixir/Erlang Hot Swapping Code

Hot Code Swapping: Basics
- Example
- Example: iex
Relups
Summary

One of the untold benefits of having a runtime is the ability for that runtime to enable loading and unloading code while the runtime is active. Since the runtime is itself, essentially, a virtual machine with its own operating system and process scheduling, it has the ability to start and stop, load and unload processes and code similar to how "real" operating systems do.

Warning, there be black magic here.

This enables some spectacular power in terms of creating deployments and rolling out those deployments. That is, if we can provide a particular artifact for the runtime to load and replace the running system with, we can instruct it to upgrade our system(s) without restarting them, without interrupting our services or affecting users of those systems. Furthermore, if we constrain the system and make a few particular assumptions, this can all happen nearly instantaneously. For example, Erlang releases happen in seconds because of the functional approach taken by the language, this compared to other systems like Docker and/or Kubernetes which may take several minutes or hours to transition a version because there is no safe assumptions to make about running code.

This post will be a small tour through how Elixir and Erlang can perform code hot swapping, and how this can be useful for deployments.

Hot Code Swapping: Basics

There are several functions defined in the :sys and :code modules that are required for this first example. Namely, the following functions:

:code.load_file/1
:sys.suspend/1
:sys.change_code/4
:sys.resume/1

The :sys.suspend/1 function takes a single parameter, the Process ID (PID) of the process to suspend, similarly, :sys.resume also takes a PID of the process to resume. The :code.load_file/1 function, unfortunately named, takes a single parameter: the module to load into memory. Finally, the :sys.change_code function takes four parameters: name, module, old_version, and extra. The name is the PID or the registered atom of the process. The extra argument is a reserved parameter for each process, it's the same extra that will be passed to the restarted process's code_change/3 function.

Example

Let's assume we have a particularly simple module, say KV, similar to the following:

defmodule KV do
  use GenServer

  @vsn 0

  def start_link() do
    GenServer.start_link(__MODULE__, [], name: __MODULE__)
  end

  def init(_) do
    {:ok, %{}}
  end

  def get(key, default \\ nil) do
    GenServer.call(__MODULE__, {:get, key, default})
  end

  def put(key, value) do
    GenServer.call(__MODULE__, {:put, key, value})
  end

  def handle_call({:get, key, default}, _caller, state) do
    {:reply, Map.get(state, key, default), state}
  end

  def handle_call({:put, key, value}, _caller, state) do
    {:reply, :ok, Map.put(state, key, value)}
  end

end

Save this into a file, say, kv.ex. Next we will compile it and load it into an iex session:

% elixirc kv.ex
% iex
iex> l KV
{:module, KV}

We can start the process and try it out:

iex> KV.start_link
{:ok, #PID<0.84.0>}
iex> KV.get(:a)
nil
iex> KV.put(:a, 42)
:ok
iex> KV.get(:a)
42

Now, let's say we wish to add some logging to the handling of the :get and :put messages. We will apply a patch similar to the following:

--- a/kv.ex
+++ b/kv.ex
@@ -1,7 +1,8 @@
 defmodule KV do
+  require Logger
   use GenServer

-  @vsn 0
+  @vsn 1

   def start_link() do
     GenServer.start_link(__MODULE__, [], name: __MODULE__)
@@ -20,10 +21,12 @@ defmodule KV do
   end

   def handle_call({:get, key, default}, _caller, state) do
+    Logger.info("#{__MODULE__}: Handling get request for #{key}")
     {:reply, Map.get(state, key, default), state}
   end

   def handle_call({:put, key, value}, _caller, state) do
+    Logger.info("#{__MODULE__}: Handling put request for #{key}:#{value}")
     {:reply, :ok, Map.put(state, key, value)}
   end

Without closing the current iex session, apply the patch to the file and compile the module:

% patch kv.ex kv.ex.patch
% elixirc kv.ex

You may see a warning about redefining an existing module, this warning can be safely ignored.

Now, in the still open iex session, let's begin the black magic incantations:

iex> :code.load_file KV
{:module, KV}
iex> :sys.suspend(KV)
:ok
iex> :sys.change_code(KV, KV, 0, nil)
:ok
iex> :sys.resume(KV)
:ok

Now, we should be able to test it again:

iex> KV.get(:a)
21:28:47.989 [info]  Elixir.KV: Handling get request for a
42
iex> KV.put(:b, 2)
21:28:53.729 [info]  Elixir.KV: Handling put request for b:2
:ok

Thus, we are able to hot-swap running code, without stopping, losing state, or effecting processes waiting for that data!

But the above is merely an example of manually invoking the code reloading API, there are better ways to achieve the same result.

Example: `iex`

There are several functions available to us when using iex that essentially perform the above actions for us:

c/1: compile file
r/1: (recompile and) reload module

The r/1 helper takes an atom of the module to reload, c/1 takes a binary of the path to the module to compile. Check the documentation for more information.

Therefore, using these, we can simplify what we did in the previous example to simply a call to r/1:

iex> r KV
warning: redefining module KV (current version loaded from Elixir.KV.beam)
  kv.ex:1

{:reloaded, KV, [KV]}
iex> KV.get(:a)

21:52:47.829 [info]  Elixir.KV: Handling get request for a
42

In one function, we have done what previously took four functions. However, the story does not end here. This was only for a single module, one GenServer. What about when we want to upgrade more modules, or an entire application?

Although c/1 and r/1 are great for development. They are not recommended for production use. Do not depend on them to perform deployments.

Relups

Fortunately, there is another set of tooling that allows us to more easily deploy releases, and more pointedly, perform upgrades: Relups. Before we dive straight into relups, let's discuss a few other related concepts.

Erlang Applications

As part of Erlang "Applications", there is a related file, the .app file. This resource file describes the application: other applications that should be started and other metadata about the application. Using Elixir, this file can be found in the _build/{Mix.env}/lib/{app_name}/ebin/ folder.

Here's an example .app file from the octochat demo application:

± cat _build/dev/lib/octochat/ebin/octochat.app
{application,octochat,
         [{registered,[]},
          {description,"Demo Application for How Swapping Code"},
          {vsn,"0.3.3"},
          {modules,['Elixir.Octochat','Elixir.Octochat.Acceptor',
                    'Elixir.Octochat.Application','Elixir.Octochat.Echo',
                    'Elixir.Octochat.ServerSupervisor',
                    'Elixir.Octochat.Supervisor']},
          {applications,[kernel,stdlib,elixir,logger]},
          {mod,{'Elixir.Octochat.Application',[]}}]}.

This is a pretty good sized triple (3-tuple). By the first element of the triple, we can tell it is an application, the application's name is octochat given by the second element, and everything in the list that follows is a keyword list that describes more about the octochat application. Notably, we have the usual metadata found in the mix.exs file, the modules that make up the application, and the other OTP applications this application requires to run.

Erlang Releases

An Erlang "release", similar to Erlang application, is an entire system: the Erlang VM, the dependent set of applications, and arguments for the Erlang VM.

After building a release for the Octochat application with the distillery project, we get a .rel file similar to the following:

± cat rel/octochat/releases/0.3.3/octochat.rel
{release,{"octochat","0.3.3"},
     {erts,"8.1"},
     [{logger,"1.3.4"},
      {compiler,"7.0.2"},
      {elixir,"1.3.4"},
      {stdlib,"3.1"},
      {kernel,"5.1"},
      {octochat,"0.3.3"},
      {iex,"1.3.4"},
      {sasl,"3.0.1"}]}.

This is an Erlang 4-tuple; it's a release of the "0.0.3" version of octochat. It will use the "8.1" version of "erts" and it depends on the list of applications (and their versions) provided in the last element of the tuple.

Appups and Relups

As the naming might suggest, "appups" and "relups" are the "upgrade" versions of applications and releases, respectively. Appups describe how to take a single application and upgrade its modules, specifically, it will have instructions for upgrading modules that require "extras". or, if we are upgrading supervisors, for example, the Appup will have the correct instructions for adding and removing child processes.

Before we examine some examples of these files, let's first look at the type specification for each.

Here is the syntax structure for the appup resource file:

{Vsn,
  [{UpFromVsn, Instructions}, ...],
  [{DownToVsn, Instructions}, ...]}.

The first element of the triple is the version we are either upgrading to or downgrading from. The second element is a keyword list of upgrade instructions keyed by the version the application would be coming from. Similarly, the third element is a keyword list of downgrade instructions keyed by the version the application will downgrade to. For more information about the types themselves, see the SASL documentation.

Now that we have seen the syntax, let's look at an example of the appup resource file for the octochat application generated using distillery:

± cat rel/octochat/lib/octochat-0.2.1/ebin/octochat.appup
{"0.2.1",
 [{"0.2.0",[{load_module,'Elixir.Octochat.Echo',[]}]}],
 [{"0.2.0",[{load_module,'Elixir.Octochat.Echo',[]}]}]}.

Comparing this to the syntax structure above, we see that we have a Vsn element of "0.2.1", we have a {UpFromVsn, Instructions} pair: [{"0.2.0",[{load_module,'Elixir.Octochat.Echo',[]}]}], and we have a single {DownToVsn, Instructions} pair: [{"0.2.0",[{load_module,'Elixir.Octochat.Echo',[]}]}].

The instructions themselves tell us what exactly is required to go from one version to the another. Specifically, in this example, to upgrade, we need to "load" the Octochat.Echo module into the VM. Similarly, the instructions to downgrade are the same. For a semantically versioned project, this is an understandably small change.

It's worth noting the instructions found in the .appup files are usually high-level instructions, thus, load_module covers both the loading of object code into memory and the suspend, replace, resume process of upgrading applications.

Next, let's look at the syntax structure of a relup resource file:

{Vsn,
 [{UpFromVsn, Descr, Instructions}, ...],
 [{DownToVsn, Descr, Instructions}, ...]}.

This should look familiar. It's essentially the exact same as the .appup file. However, there's an extra term, Descr. The Descr field can be used as part of the version identification, but is optional. Otherwise, the syntax of this file is the same as the .appup.

Now, let's look at an example relup file for the same release of octochat:

± cat rel/octochat/releases/0.2.1/relup
{"0.2.1",
 [{"0.2.0",[],
   [{load_object_code,{octochat,"0.2.1",['Elixir.Octochat.Echo']}},
    point_of_no_return,
    {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}}]}],
 [{"0.2.0",[],
   [{load_object_code,{octochat,"0.2.0",['Elixir.Octochat.Echo']}},
    point_of_no_return,
    {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}}]}]}.

This file is a little more dense, but still adheres to the basic triple syntax we just examined. Let's take a closer look at the upgrade instructions:

[{load_object_code,{octochat,"0.2.1",['Elixir.Octochat.Echo']}},
 point_of_no_return,
 {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}}]

The first instruction, {load_object_code,{octochat,"0.2.1",['Elixir.Octochat.Echo']}}, tells the release handler to load into memory the new version of the "Octochat.Echo" module, specifically the one associated with version "0.2.1". But this instruction will not instruct the release handler to (re)start or replace the existing module yet. Next, point_of_no_return, tells the release handler that failure beyond this point is fatal, if the upgrade fails after this point, the system is restarted from the old release version (appup documentation). The final instruction, {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}}, tells the release handler to replace the running version of the module and use the newly loaded version.

For more information regarding burtal_purge, check out the "PrePurge" and "PostPurge" values in the appup documentation.

Similar to the .appup file, the third element in the triple describes to the release handler how to downgrade the release as well. The version numbers in this case make this a bit more obvious as well, however, the steps are essentially the same.

Generating Releases and Upgrades with Elixir

Now that we have some basic understanding of releases and upgrades, let's see how we can generate them with Elixir. We will generate the releases with the distillery project, however, the commands should also work with the soon to be deprecated exrm project.

This has been written for the 0.10.1 version of distillery. This is a fast moving project that is in beta, be prepared to update as necessary.

Add the distillery application to your deps list:

{:distillery, "~> 0.10"}

Perform the requisite dependency download:

± mix deps.get

Then, to build your first production release, you can use the following:

± MIX_ENV=prod mix release --env prod

For more information on why you must specify both environments, please read the FAQ of distillery. If the environments match, there's a small modification to the ./rel/config.exs that can be made so that specifying both is no longer necessary.

After this process is complete, there should be a new folder under the ./rel folder that contains the new release of the project. Within this directory, there will be several directories, namely, bin, erts-{version}, lib, and releases. The bin directory will contain the top level Erlang entry scripts, the erts-{version} folder will contain the requisite files for the Erlang runtime, the lib folder will contain the compiled beam files for the required applications for the release, and finally, the releases folder will contain the versions of the releases. Each folder for each version will have its own rel file, generated boot scripts, as per the OTP releases guide, and a tarball of the release for deployment.

Deploying the release is a little out of scope for this post and may be the subject of another. For more information about releases, see the System Principles guide. However, for Elixir, it may look similar to the following:

Copy the release tarball to the target system:

± scp rel/octochat/releases/0.3.2/octochat.tar.gz target_system:/opt/apps/.

On the target system, unpack the release:

± ssh target_system
(ts)# cd /opt/apps
(ts)# mkdir -p octochat
(ts)# tar -zxf octochat.tar.gz -C octochat

Start the system:

(ts)# cd octochat
(ts)# bin/octochat start

This will bring up the Erlang VM and the application tree on the target system.

Next, after making some applications changes and bumping the project version, we can generate an upgrade release using the following command:

± MIX_ENV=prod mix release --upgrade

Note, This will also generate a regular release.

Once this process finishes, checking the rel/{app_name}/releases folder, there should be a new folder for the new version, and a relup file for the upgrade:

± cat rel/octochat/releases/0.3.3/octochat.rel
{release,{"octochat","0.3.3"},
     {erts,"8.1"},
     [{logger,"1.3.4"},
      {compiler,"7.0.2"},
      {elixir,"1.3.4"},
      {stdlib,"3.1"},
      {kernel,"5.1"},
      {octochat,"0.3.3"},
      {iex,"1.3.4"},
      {sasl,"3.0.1"}]}.

± cat rel/octochat/releases/0.3.3/relup
{"0.3.3",
 [{"0.3.2",[],
   [{load_object_code,{octochat,"0.3.3",['Elixir.Octochat.Echo']}},
    point_of_no_return,
    {suspend,['Elixir.Octochat.Echo']},
    {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}},
    {code_change,up,[{'Elixir.Octochat.Echo',[]}]},
    {resume,['Elixir.Octochat.Echo']}]}],
 [{"0.3.2",[],
   [{load_object_code,{octochat,"0.3.1",['Elixir.Octochat.Echo']}},
    point_of_no_return,
    {suspend,['Elixir.Octochat.Echo']},
    {code_change,down,[{'Elixir.Octochat.Echo',[]}]},
    {load,{'Elixir.Octochat.Echo',brutal_purge,brutal_purge}},
    {resume,['Elixir.Octochat.Echo']}]}]}.

Similarly, to deploy this new upgrade, copy the tarball to the target system and unpack it into the same directory as before.

After it's unpacked, upgrading the release can be done via a stop and start, or we can issue the upgrade command:

(ts)# bin/octochat stop
(ts)# bin/octochat start

Or:

(ts)# bin/octochat upgrade "0.3.3"

When starting and stopping, the entry point script knows how to select the "newest" version.

When upgrading, it is required to specify the desired version, this is necessary since the upgrade process may require more than simply jumping to the "latest" version.

Summary

Release management is a complex topic, upgrading without restarting seemingly even more so. However, the process can be understood, and knowing how the process works will allow us to make more informed decisions regarding when to use it.

The tooling for performing hot upgrades has been around for a while, and while the tooling for Elixir is getting closer, we are not quite ready for prime time. But it won't remain this way for long. Soon, it will be common place for Elixir applications to be just as manageable as the Erlang counterparts.