Scalability
Scalability
Scaling Up
The workload of single RSB instance is expected to be minimal, as the bulk of the computation is performed by RPooli while RSB takes care of moving job requests and results between JMS, file system and e-mail destinations.
Increasing the processing throughput of a single RSB node can be achieved by increasing the size of the RPooli node pool and the number of RSB workers. Alternatively, if a single RPooli instance is reaching a limit that prevents adding more nodes, RSB can be configured to connect multiple RPooli instances in order to spread the workload across them. The selection of a particular pool could be achieved by assigning each independent RSB worker to a specific RPooli node or by using the configurable association between applications and RPooli pools (for example, to dispatch process intensive applications to a specific RPooli pool).
Scaling Out
Currently, RSB is architectured to run as a single stand-alone node. Should it become necessary to run several nodes in parallel, the following must be considered:
-
RSB polls email resources: running several instances concurrently consuming the same inboxes would create issues, as some jobs could potentially be retrieved several times (an email resource is not transactional). A possible mitigation is to configure each node differently so they don't compete for the same inboxes.
-
RSB uses the local file system for handling multi-file jobs: if several RSB nodes get connected to a single JMS provider (instead of each of them using an embedded one), it is possible that a JMS message carries a pointer that references a file present in the file system of another node. A possible mitigation consists in carrying the full multi-file job payloads in JMS messages instead of File references.
-
Results for the REST API are stored in the local file system: in order to have several RSB nodes serve the same results either the file system where the results are stored should be shared across machines (like with an NFS mount) or an alternative implementation of [ResultStore] (https://rsb-doc.openanalytics.eu/current/apidocs/index.html?eu/openanalytics/rsb/data/ResultStore.html) that allows sharing over the network (for example a DB or Redis backed implementation).