SGA

SGA is a daemon for executing and monitoring CSBase jobs. This implementation is made of the following components:

sga-daemon

The core SGA daemon. To use it, you will also need a driver.

sga-driver-posix

A SGA daemon driver for running jobs locally on POSIX operating systems. Tested on Linux and Cygwin.

sga-driver-pbs

A SGA daemon driver for running jobs on Torque PBS clusters. This driver is written using sga-exec, so it can run either locally (at the cluster master machine) or manage the cluster remotely over SSH.

sga-driver-slurm

A SGA daemon driver for running jobs on Slurm clusters. This driver is written using sga-exec, so it can run either locally (at the cluster master machine) or manage the cluster remotely over SSH.

To enable SSH tunnelling to a Slurm server, see next section.

sga-exec

An extensible library for abstracting local and remote execution of commands, to be used by SGA drivers.

To enable SSH tunnelling in your sga.exec-powered SGA driver, add the following to your sgad.cfg:

driver_config = {
   exec_driver = "sga.exec.ssh",
   exec_config = {
      host = "username@hostname",
      port = 22,
   }
}

If the SSH tunnelling needs authentication, add your id_rsa in your ~/.ssh directory and fill your id_rsa.pub file content in username@hostname:.ssh/authorized_keys file.

To use sga-exec when writing your own driver, the rule of thumb is to avoid Lua's standard io.* and os.* routines.

ssh-datatransfer

A SGA daemon can use a ssh data transfer mechanism to copy input and executable files to execute on remote host sandbox. The SSH data transfer configuration can be enable with posix driver. Add the following to your sgad.cfg:

driver = "sga.driver.posix"
extra_config = {
    csbase_transfer_name = "ssh-datatransfer",
    csbase_csfs_root_dir = "/tmp/csfs_sandbox",
    ssh_host = "localhost",
    ssh_port = 22,
    ssh_user_name = "csgrid",
    ssh_private_key_path = "/home/csgrid/.ssh/csgrid_id_rsa"
}

Note: Add csgrid_id_rsa private key in /home/csgrid/.ssh CSGrid server directory and fill csgrid_id_rsa.pub file content in SGA csgrid home directory .ssh/authorized_keys file.

Multiple SGAs with same instalation directory

Multiple SGAs can be run from the same installation directory. In this case, we have a installation directory shared among all SGA machines via NFS. All SGAs machines share the same script sgad.sh and configuration file sgad.cfg. To run multiple SGAs in this environment, the following changes can be done:

sgad.sh

#!/bin/bash
export CSBASE_SERVER="http://localhost:40409"
export SGAD_HOST=$HOSTNAME
export SGAD_PORT=40100
export SGAD_NAME=$HOSTNAME
export SGAD_PLATFORM="Linux44_64"
export SERVER_DATA_DIR="/mnt/csgrid_data"

timestamp=$(date +%Y%m%d%H%M%S)
logsdir="logs"

[[ ! -e $logsdir ]] && echo "mkdir $logsdir" && mkdir $logsdir

logfile="logs/${HOSTNAME}_sgad_${timestamp}.log"

configfile=${1}
if [ -z ${configfile} ]; then
   echo "Using default file configuration" > ${logfile};
   configfile=sgad.cfg
fi
hostruntimedir="/tmp/${SGAD_NAME}"
sgadruntimedir="${hostruntimedir}/sgad"
runtimesandboxdir="${sgadruntimedir}/sandbox"

[[ ! -e $hostruntimedir ]] && echo "mkdir $hostruntimedir..." && mkdir $hostruntimedir
[[ ! -e $sgadruntimedir ]] && echo "mkdir $sgadruntimedir..." && mkdir $sgadruntimedir
[[ ! -e $runtimesandboxdir ]] && echo "mkdir $runtimesandboxdir..." && mkdir $runtimesandboxdir

eval $(luarocks path --bin)
sgad ${configfile} 2>&1 | tee -a "${logfile}"

sgad.cfg

csbase_server = os.getenv("CSBASE_SERVER")
platform = os.getenv("SGAD_PLATFORM") or "Linux44_64"
sgad_host = os.getenv("SGAD_HOST")
sgad_port = tonumber(os.getenv("SGAD_PORT"))
sga_name = os.getenv("SGAD_NAME")
status_interval_s = 10
exec_polling_interval_s = 5
register_retry_s = 3
project_root_dir = os.getenv("SERVER_DATA_DIR") .. "/projects"
algorithm_root_dir = os.getenv("SERVER_DATA_DIR") .. "/algorithms"
runtime_data_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad"
sandbox_root_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad/sandbox"
driver = "sga.driver.posix"
resources = {
  "docker"
}

Install

Requirements:

  • Lua 5.2
  • Lua 5.2 dev (para Ubuntu liblua5.2-dev)
  • gcc 4.8.5
  • g++
  • make
  • curl
  • unzip
  • openssl_dev 1.1
  • perl
  • ksh
  • LuaRocks 2.4.2 (or higher)

Note: For a Microsoft Windows installation, it is recommended to use Cygwin for the dependencies.

Clone the git repository

git clone https://git.tecgraf.puc-rio.br/csbase/sgarest-daemon
cd sgarest-daemon

Note: On Microsoft Windows, run the following LuaRocks commands to build dependencies which may fail in Windows:

luarocks install xml CC=g++ LD=g++
luarocks install luaposix LDFLAGS=-no-undefined

Run the following LuaRocks commands to install the SGA core:

luarocks install lua-schema-scm-1.rockspec
luarocks make sga-daemon-scm-1.rockspec

Additionally, run the following LuaRocks commands to install at least one of the follwing the drivers:

POSIX

luarocks make sga-driver-posix-scm-1.rockspec

PBS (experimental)

luarocks make sga-exec-scm-1.rockspec
luarocks make sga-driver-pbs-scm-1.rockspec

Slurm (experimental)

luarocks make sga-exec-scm-1.rockspec
luarocks make sga-driver-slurm-scm-1.rockspec

To install locally for the current user, use the option --local in the commands above.

Self-Contained Installation

Install LuaRocks on a particular path with the following options:

./configure --force-config --prefix=$SGA_INSTALL_PATH
make install

Installation without Internet

Unpack the LuaRock dependencies and add options --only-server=$REPO_UNPACKED_PATH to all LuaRocks install and make commands.

Installation behind a proxy

Follow instructions on https://github.com/luarocks/luarocks/wiki/LuaRocks-through-a-proxy

Docker

Build

docker build . --network host -t csbase/sgarest-daemon

To use sga-driver-slurm in a specific runtime root directory, use the following build command:

docker build . -t csbase/sgarest-daemon --build-arg RUNTIME_DIR=/path/to/directory

Run

docker run --rm \
-p 40100:40100 \
-v ~/.ssh:/root/.ssh \
-v /home/sgad/logs/sga:/sgad/logs \
-v /home/sgad/projects:/sgad/projects \
-v /home/sgad/algorithms:/sgad/algorithms \
-e CSBASE_SERVER="http://csgrided:40509" \
-e SGAD_HOST="sgad40100" \
-e SGAD_PORT="40100" \
-e SGAD_NAME="40100" \
-e SLURM_HOST="hostname" \
-e SLURM_USER="username" \
-e SLURM_PWD="password" \
--network host \
--privileged \
csbase/sgarest-daemon

Arguments -e SLURM_XXX=XXX are required only for sga-driver-slurm.

Credits

This next-generation SGA was designed and implemented at LabLua, PUC-Rio by Hisham Muhammad hisham@gobolinux.org and Ana Lúcia de Moura amoura@inf.puc-rio.br.