RPooli

RPooli is pool of R nodes, exposed over RMI and managed over a RESTful API.

Overview

RPooli configuration files follow the .properties file format. You can read more information about the format here.

RPooli is configured by three distinct configuration files that represent three configuration domains:

  • R/Rj nodes configuration, including the R code snippet to be run for each new node that is created,
  • Pool configuration, including size and time-outs,
  • Network configuration, including the host and port of the RMI registry.

RPooli loads these files from a detected configuration directory, which is the first file system path that exists and is a directory in the following list:

  • /etc/rpooli
  • ~/.rpooli
  • /WEB-INF , in the web-app work dir
  • the OS temporary directory

The last option is obviously a non-production grade fallback.

If any of these configuration files is missing, RPooli will use default values and will be able to start anyway.

All the configuration domains of RPooli can be modified via its API. When a configuration domain is modified via the API, the corresponding .properties file is generated and persisted in the file system, at the location where they were initially loaded from or, if none pre-existed, at the detected configuration directory.

Examples

Here are examples of the three configuration files as well as detailed information on the properties that can be set in these files.

R/Rj nodes configuration

The rconfig.properties file allows to control starting up nodes in the RPooli pool. An example configuration is given below and can be downloaded here.

    #
    # Rpooli R/Rj Configuration Example
    #
    r_home.path=/usr/lib/R
    java_cmd.args=-server
    node_cmd.args=
    r_startup.snippet=library(RSBXml)\r\nlibrary(RSBJson)
    debug_verbose.enabled=false
    debug_console.enabled=false
    startstop_timeout.millis=15000

The different fields that can be set in the rconfig.properties file are:

  • r_home.path: path to the R home directory; if you want to know what that directory is for a particular R installation you can issue the R.home() command in an interactive R session;
  • node_environment.variables.XYZ: environment variables to set inside the R node, where XYZ needs to be replaced by the variable name (such as R_LIBS, LD_LIBRARY_PATH etc);
  • java_cmd.args: extra arguments for the Java runtime passed to the Java command that starts up a node;
  • node_cmd.args: extra arguments for the node (rj) passed to the command that starts up a node;
  • r_startup.snippet: R commands that will be executed whenever a node in the pool is started up; this can
    e.g. be used to preload certain packages to save time when actual requests are being processed;
  • debug_verbose.enabled: boolean to indicate whether verbose debugging output needs to be generated by the node;
  • debug_console.enabled: boolean to indicate whether a debug console (StatET) can be connected to the node;
  • startstop_timeout.millis: period of time in milliseconds after which the starting of nodes will timeout.

Pool configuration.

The behaviour of the pool of R processes can be set in the poolconfig.properties file. An example configuration is given below and can be downloaded here.

    #
    # RPooli Pool Configuration Example
    #
    max_total.count=20
    max_idle.count=10
    min_idle.count=1
    min_idle.millis=600000
    max_usage.count=100
    max_wait.millis=3000
    eviction_timeout.millis=1800000

The poolconfig.properties file allows the following fields to be set:

  • max_total.count: the maximum total number of nodes that can be running in the pool;
  • max_idle.count: the maximum number of nodes that can be idling in the pool;
  • min_idle.count: the minimum number of idle nodes that needs to be running in the pool;
  • min_idle.millis: minimum period of time in milliseconds a node is idling in the pool before it is eligible for removal because the pool has more idle nodes than the specified by min_idle.count;
  • max_usage.count: maximum number of times a node can be used before it is shut down and restarted;
  • max_wait.millis: maximum period of time in milliseconds RSB will wait for a node before it will timeout;
  • eviction_timeout.millis: period of time in milliseconds the pool will wait for return of allocated nodes when deleting a node (deleteNodesByNodeId with kill= false) or stopping the pool before the allocated nodes will be forcibly removed.t

Network configuration

The network configuration for the RPooli pool can be set in the netconfig.properties file An example configuration is given below and can be downloaded here.

    #
    # RPooli Network Configuration Example
    #
    host.address=localhost
    rmi_registry.address.port=1099
    rmi_registry.embed.enabled=true
    ssl.enabled=false
  • host.address: hostname or IP address to use for publishing the RPooli pool;
  • rmi_registry.address.port: the TCP port of the RMI registry used to publish the RPooli pool;
  • rmi_registry.embed.enabled: whether RPooli will start an embedded RMI registry itself, or will use an already running RMI registry;
  • ssl.enabled: whether to use SSL secured RMI connection (requires setup of keystores for the Java runtimes).