1   Introduction

(1) Python is a wonderful scripting language: in particular it is useful for controlling analytic operations in the python data science toolkit (for example, numpy, scipy, pandas, dask, etc.). (2) Elixir is a wonderful scripting language: in particular it is useful for controlling light weight (Erlang) processes and for multiprocessing. This article gives an example and describes a way to combine and use those two capabilities.

2   Elixir plus the Python data science toolset

For a little bit of help with the Python data science tool set and links to much more help, see my (work in progress): http://www.davekuhlman.org/py-datasci-survey.html.

For help with Elixir and the Elixir tools discussed in this article, see:

The example -- This example Elixir application uses Poolboy to create a pool of Elixir processes. Each of those processes creates an OS process containing and running a Python interpreter. The application then uses those tasks to execute a Python function that uses dask to compute the mean of the values in a column of a dask dataframe and (2) uses lxml to count the elements in an XML document.

The effect is that we are able to make multiple concurrent requests to evaluate Python code. These requests will be distributed across and executed concurrently in processes in a pool of Elixir processes each one of which creates and holds an OS process containing the Python interpreter. And, because they are separate OS processes, the Python GIL (global interpreter lock) will not prevent our code from running concurrently across multiple CPU cores and multiple CPUs.

Note that this example uses Poolboy to manage the pool of processes. There are, however, other pool managers for Erlang/Elixir, for example, Pooler: https://github.com/seth/pooler.

Also note that within each process managed by Poolboy: (1) we use Erlport to create an OS process that contains the Python interpreter and (2) we use Erlport to make request to (call functions in) that Python code.

3   The code

You can find an archive containing the code for this sample application here: elixir_example_test12.zip.

4   Testing

I did my testing at the Elixir interactive prompt: IEx. See: https://hexdocs.pm/iex/IEx.html. After you have created and built (compiled) your application, you can start IEx with $ iex -S mix:

$ iex -S mix
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:2:2] [ds:2:2:10]
[async-threads:10] [hipe] [kernel-poll:false]

*** started python -- pid: #PID<0.162.0>
*** started python -- pid: #PID<0.164.0>
*** started python -- pid: #PID<0.166.0>
*** started python -- pid: #PID<0.168.0>
*** started python -- pid: #PID<0.170.0>
Interactive Elixir (1.6.5) - press Ctrl+C to exit (type h() ENTER
for help)
iex(1)>

Notes:

  • The starred messages about Python, above, are debugging messages to show that we successfully started several OS processes containing/running the Python interpreter.

5   Poolboy

Our goal is to use Poolboy to manage a pool of processes, then evaluate Python code in some process in that pool.

Why we want to do that -- In order to run Python code from Elixir (or Erlang) we use Erlport. But, Erlport starts up an OS process in which to run Python. That's a reasonably heavy-weight operation. So, we'd like to be able to create a pool of those Elixir processes each one of which is connected to a Python OS process, and then reuse them without starting a new OS process for each request.

In order to set that up and to get Poolboy to manage that pool of processes for us, we need to do the following:

  1. Generate our Elixir application with a supervisor and a supervision tree:

    $ mix new test12 --sup
    

    This means that a module using the Supervisor behavior will supervise and monitor our GenServer module. It will start up the GenServer modules, implemented as GenServer behaviors, which in turn start our Python OS processes, and restart them if and when any of them fail.

  2. Add Poolboy to our dependencies in mix.exs:

    defp deps do
      [
        {:poolboy, "~> 1.5.1"},
        {:erlport, git: "https://github.com/hdima/erlport.git"},
      ]
    end
    
  3. Configure Poolboy in our Elixir application module (lib/test12/application.ex):

    def start(_type, _args) do
      # List all child processes to be supervised
      children = [
        # Starts a worker by calling: Test12.Worker.start_link(arg)
        # {Test12.Worker, arg},
        :poolboy.child_spec(:worker, poolboy_config())
      ]
      # See https://hexdocs.pm/elixir/Supervisor.html
      # for other strategies and supported options
      opts = [strategy: :one_for_one, name: Test12.Supervisor]
      Supervisor.start_link(children, opts)
    end
    
    defp poolboy_config do
      [
        {:name, {:local, :worker}},
        {:worker_module, PoolboyApp.Worker},
        {:size, 5},
        {:max_overflow, 0}
      ]
    end
    
  4. Pass our requests through Poolboy (lib/test12/test.ex):

    defp async_call_mean(index, data_file_name, column_name, verbose) do
      Task.async(fn ->
        :poolboy.transaction(
          :worker,
          fn pid ->
            GenServer.call(
              pid,
              {:mean, index, data_file_name, column_name, verbose})
          end,
          @timeout
        )
      end)
    end
    

    Notes:

    • We reference the Poolboy function with :poolboy.transaction(...), because Poolboy is implemented in Erlang.
  5. And, finally, we receive our results back asynchronously (lib/test12/test.ex):

    defp await_and_inspect_mean(task) do
      task
      |> Task.await(@timeout)
      |> (fn ({:result, value}) ->
        case value do
          {:not_found, msg} ->
            IO.puts(msg)
          {:invalid_column, msg} ->
            IO.puts(msg)
          {:ok, {index, data_file_name, column_name, mean}} ->
            IO.puts("index: #{index},  df: #{data_file_name}  col: #{column_name}  mean: #{mean}")
        end
      end).()
    end
    

    Notes:

    • In our requests, we pass enough information to Python so that our Python function can return enough information so that we can match up and identify which response goes to which request.

6   Erlport

Erlport enables us to start up OS processes and run the Python interpreter inside each of those OS process. We can then use Erlport to evaluate Python functions in that interpreter.

Ruby -- Some of you may be interested to know that Erlport supports Ruby as well as Python.

For example, I can start Python and call sys.path.__str__() using the following:

$ iex -S mix
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:2:2] [ds:2:2:10]
[async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.6.5) - press Ctrl+C to exit (type h() ENTER for help)

iex> {:ok, python_pid} = :python.start([])
{:ok, #PID<0.173.0>}

iex> :python.call(python_pid, :sys, :"path.__str__", [])
"['',
'/home/dkuhlman/b1/Erlang/Elixir/test12/_build/dev/lib/erlport/priv/python2',
'/usr/lib/python2.7', '/usr/lib/python2.7/plat-x86_64-linux-gnu',
'/usr/lib/python2.7/lib-tk', '/usr/lib/python2.7/lib-old',
'/usr/lib/python2.7/lib-dynload',
'/usr/local/lib/python2.7/dist-packages',
'/usr/lib/python2.7/dist-packages']"

And, I can (1) start a specific installed version of Python (in this case Python 3 in my Anaconda installation) and (2) call a specific function in a specific module in a specific directory. For example:

$ iex -S mix
Erlang/OTP 20 [erts-9.3] [source] [64-bit] [smp:2:2] [ds:2:2:10]
[async-threads:10] [hipe] [kernel-poll:false]
Interactive Elixir (1.6.5) - press Ctrl+C to exit (type h() ENTER for help)

$ iex> pypath = '/home/dkuhlman/b1/Erlang/Elixir/test12/pylib:'
'/home/dkuhlman/b1/Erlang/Elixir/test12/pylib:'

iex> pybinpath = '/home/dkuhlman/b1/Python/Anaconda/Anaconda3/bin/python'
'/home/dkuhlman/b1/Python/Anaconda/Anaconda3/bin/python'

iex> {:ok, python_pid} = :python.start([python_path: pypath, python: pybinpath])
{:ok, #PID<0.207.0>}

iex> :python.call(python_pid, :pymod_datasci01, :mean, [1, "tmp5.csv", "B", 1])
py version: 3.6.5 |Anaconda custom (64-bit)| (default, Apr 29 2018, 16:14:56)
[GCC 7.2.0]
idx: 1  ds: /home/dkuhlman/b1/Erlang/Elixir/test12/pylib/tmp5.csv cn: B mean: -0.0570
{:ok, {'  1', 'tmp5.csv', 'B', '-0.0570'}}

Notes:

  • In the call to :python.start/1, parameter python_path specifies the location of the directory containing my Python modules and code.
  • And, parameter python provides the location of the Python interpreter that I want to run.

7   Details -- installation, setup, and use

I used mix, the Elixir build tool, to create my application and build it. See: https://hexdocs.pm/mix/Mix.html

Here is what I did in order to set this up:

  • Created a new app with mix:

    $ mix new test12 --sup
    

    Notes:

    • Use the --sup option to create an OTP application with a supervisor. We want our worker processes to restart if and when they fail.
  • Added Erlport and Poolboy to dependencies in mix.exs:

    defp deps do
      [
        {:poolboy, "~> 1.5.1"},
        {:erlport, git: "https://github.com/hdima/erlport.git"},
      ]
    end
    
  • Retrieved dependencies and compiled:

    $ cd test12
    $ mix deps.get
    $ mix compile
    
  • Cloned the Erlport repository -- Even though the above steps built Erlport into this application, I also wanted it available independently of and outside of this application:

    $ git clone https://github.com/hdima/erlport.git
    
  • Built erlport and created the release (packages).

    $ cd erlport
    $ make
    $ make release
    

    The above created the erlport package under erlport/packages/.

  • Copied the erlport release to my Erlang installation:

    $ cd /path/to/erlang/installation/20.3/lib
    $ cp -rp /path/to/git/repo/erlport/packages/erlport-0.9.8 .
    
  • Copied the erlport release to my Python Anaconda installation:

    $ cd /path/to/Anaconda/Anaconda3/lib/python3.6/site-packages
    $ cp -rp /path/to/git/repo/erlport/packages/erlport-0.9.8/priv/python3/erlport .
    
  • Used erlport to start a Python process from within Elixir. The module lib/test12/worker.ex implements each worker process in the pool of processes managed by Poolboy. In the init/1 callback function, which runs each time each of these processes is started, we do this:

    def init(_) do
      pypath = '/path/to/my_apps/test12/pylib:'
      pybinpath = '/path/to/Anaconda/Anaconda3/bin/python'
      {:ok, python_pid} = :python.start([python_path: pypath, python: pybinpath])
      IO.puts("*** started python -- pid: #{inspect(python_pid)}")
      state = [pid: python_pid]
      {:ok, state}
    end
    

    Notes:

    • pypath contains the location of the directory containing the our Python code, specifically, it contains modules pylib/pymod_datasci01.py and pylib/pymod_xml01.py.
    • pybinpath contains the location of the Python executable we want to run. There are several Python installations on my machine, and I want to use the Python 3 version in the Anaconda installation.
    • We pass our configuration to the Erlport :python.start function as values of a keyword list.
    • The values in this keyword list are Elixir character lists. For a description of erlport Python configuration parameters, see: http://erlport.org/docs/python.html#erlang-api.
    • We create a keyword list containing the process ID of the OS processes which runs the Python interpreter. This value (state) will be passed to each of our handle_call/3 callbacks (see below).
  • Then, in handle_call/3, we call a function in a Python module like this:

    result = :python.call(
      python_pid,           # the ID of the python process
      :pymod01,             # the python module containing our function
      :mean,                # the function to be called
      [index, data_set_name, column_name, verbose])     # arguments to the funtion
    

    There are two instances (clauses) that implement handle_call/3 in our worker module (remember that Elixir enables us to overload functions). Here is one of those clauses:

    def handle_call(
      {:mean, index, data_file_name, column_name, verbose}, _from, state) do
        python_pid = state[:pid]
        if verbose do
          IO.puts("process #{inspect(self())} data file #{data_file_name}")
        end
        result = :python.call(
          python_pid,
          :pymod_datasci01,
          :mean,
          [index, data_file_name, column_name, verbose])
        {:reply, {:result, result}, state}
    end
    

    Notes:

    • The state argument contains the process ID of the OS process containing our Python interpreter. Remember that this worker implements an Elixir process in the pool of processes managed by Poolboy. Each of those Elixir processes has created one Python OS process.
    • We call the mean function in the pymod_datasci01 module, passing it four arguments.
    • And, we return the result, along with the state (unchanged in this case).

Published

Last Updated

Category

elixir

Tags

Contact