-
Renato Figueiro Maia authored89c2976b
SGA
SGA is a daemon for executing and monitoring CSBase jobs. This implementation is made of the following components:
sga-daemon
The core SGA daemon. To use it, you will also need a driver.
sga-driver-posix
A SGA daemon driver for running jobs locally on POSIX operating systems. Tested on Linux and Cygwin.
sga-driver-pbs
A SGA daemon driver for running jobs on Torque PBS clusters. This driver
is written using sga-exec
, so it can run either locally (at the cluster
master machine) or manage the cluster remotely over SSH.
sga-driver-slurm
A SGA daemon driver for running jobs on Slurm clusters. This driver is
written using sga-exec
, so it can run either locally (at the cluster
master machine) or manage the cluster remotely over SSH.
To enable SSH tunnelling to a Slurm server, see next section.
sga-exec
An extensible library for abstracting local and remote execution of commands, to be used by SGA drivers.
To enable SSH tunnelling in your sga.exec
-powered SGA driver, add the
following to your sgad.cfg
:
driver_config = {
exec_driver = "sga.exec.ssh",
exec_config = {
host = "username@hostname",
port = 22,
}
}
If the SSH tunnelling needs authentication, add your id_rsa in your ~/.ssh directory and fill your id_rsa.pub file content in username@hostname:.ssh/authorized_keys file.
To use sga-exec
when writing your own driver, the rule of thumb is to
avoid Lua's standard io.*
and os.*
routines.
ssh-datatransfer
A SGA daemon can use a ssh data transfer mechanism to copy input and executable files to execute on remote
host sandbox. The SSH data transfer configuration can be enable with posix driver. Add the following to your sgad.cfg
:
driver = "sga.driver.posix"
extra_config = {
csbase_transfer_name = "ssh-datatransfer",
csbase_csfs_root_dir = "/tmp/csfs_sandbox",
ssh_host = "localhost",
ssh_port = 22,
ssh_user_name = "csgrid",
ssh_private_key_path = "/home/csgrid/.ssh/csgrid_id_rsa"
}
Note:
Add csgrid_id_rsa private key in /home/csgrid/.ssh CSGrid server directory and fill csgrid_id_rsa.pub
file content in SGA csgrid home directory .ssh/authorized_keys file.
Multiple SGAs with same instalation directory
Multiple SGAs can be run from the same installation directory. In this case, we have a installation directory shared among all SGA machines via NFS. All SGAs machines share the same script sgad.sh and configuration file sgad.cfg. To run multiple SGAs in this environment, the following changes can be done:
sgad.sh
#!/bin/bash
export CSBASE_SERVER="http://localhost:40409"
export SGAD_HOST=$HOSTNAME
export SGAD_PORT=40100
export SGAD_NAME=$HOSTNAME
export SGAD_PLATFORM="Linux44_64"
export SERVER_DATA_DIR="/mnt/csgrid_data"
timestamp=$(date +%Y%m%d%H%M%S)
logsdir="logs"
[[ ! -e $logsdir ]] && echo "mkdir $logsdir" && mkdir $logsdir
logfile="logs/${HOSTNAME}_sgad_${timestamp}.log"
configfile=${1}
if [ -z ${configfile} ]; then
echo "Using default file configuration" > ${logfile};
configfile=sgad.cfg
fi
hostruntimedir="/tmp/${SGAD_NAME}"
sgadruntimedir="${hostruntimedir}/sgad"
runtimesandboxdir="${sgadruntimedir}/sandbox"
[[ ! -e $hostruntimedir ]] && echo "mkdir $hostruntimedir..." && mkdir $hostruntimedir
[[ ! -e $sgadruntimedir ]] && echo "mkdir $sgadruntimedir..." && mkdir $sgadruntimedir
[[ ! -e $runtimesandboxdir ]] && echo "mkdir $runtimesandboxdir..." && mkdir $runtimesandboxdir
eval $(luarocks path --bin)
sgad ${configfile} 2>&1 | tee -a "${logfile}"
sgad.cfg
csbase_server = os.getenv("CSBASE_SERVER")
platform = os.getenv("SGAD_PLATFORM") or "Linux44_64"
sgad_host = os.getenv("SGAD_HOST")
sgad_port = tonumber(os.getenv("SGAD_PORT"))
sga_name = os.getenv("SGAD_NAME")
status_interval_s = 10
exec_polling_interval_s = 5
register_retry_s = 3
project_root_dir = os.getenv("SERVER_DATA_DIR") .. "/projects"
algorithm_root_dir = os.getenv("SERVER_DATA_DIR") .. "/algorithms"
runtime_data_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad"
sandbox_root_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad/sandbox"
driver = "sga.driver.posix"
resources = {
"docker"
}
Install
Requirements:
- Lua 5.2
- Lua 5.2 dev (para Ubuntu liblua5.2-dev)
- gcc 4.8.5
- g++
- make
- curl
- unzip
- openssl_dev 1.1
- perl
- ksh
- LuaRocks 2.4.2 (or higher)
Note: For a Microsoft Windows installation, it is recommended to use Cygwin for the dependencies.
Clone the git repository
git clone https://git.tecgraf.puc-rio.br/csbase/sgarest-daemon
cd sgarest-daemon
Note: On Microsoft Windows, run the following LuaRocks commands to build dependencies which may fail in Windows:
luarocks install xml CC=g++ LD=g++
luarocks install luaposix LDFLAGS=-no-undefined
Run the following LuaRocks commands to install the SGA core:
luarocks install lua-schema-scm-1.rockspec
luarocks make sga-daemon-scm-1.rockspec
Additionally, run the following LuaRocks commands to install at least one of the follwing the drivers:
POSIX
luarocks make sga-driver-posix-scm-1.rockspec
PBS (experimental)
luarocks make sga-exec-scm-1.rockspec
luarocks make sga-driver-pbs-scm-1.rockspec
Slurm (experimental)
luarocks make sga-exec-scm-1.rockspec
luarocks make sga-driver-slurm-scm-1.rockspec
To install locally for the current user, use the option --local
in the commands above.
Self-Contained Installation
Install LuaRocks on a particular path with the following options:
./configure --force-config --prefix=$SGA_INSTALL_PATH
make install
Installation without Internet
Unpack the LuaRock dependencies and add options --only-server=$REPO_UNPACKED_PATH
to all LuaRocks install
and make
commands.
Installation behind a proxy
Follow instructions on https://github.com/luarocks/luarocks/wiki/LuaRocks-through-a-proxy
Docker
Build
docker build . --network host -t csbase/sgarest-daemon
To use sga-driver-slurm in a specific runtime root directory, use the following build command:
docker build . -t csbase/sgarest-daemon --build-arg RUNTIME_DIR=/path/to/directory
Run
docker run --rm \
-p 40100:40100 \
-v ~/.ssh:/root/.ssh \
-v /home/sgad/logs/sga:/sgad/logs \
-v /home/sgad/projects:/sgad/projects \
-v /home/sgad/algorithms:/sgad/algorithms \
-e CSBASE_SERVER="http://csgrided:40509" \
-e SGAD_HOST="sgad40100" \
-e SGAD_PORT="40100" \
-e SGAD_NAME="40100" \
-e SLURM_HOST="hostname" \
-e SLURM_USER="username" \
-e SLURM_PWD="password" \
--network host \
--privileged \
csbase/sgarest-daemon
Arguments
-e SLURM_XXX=XXX
are required only for sga-driver-slurm.
Credits
This next-generation SGA was designed and implemented at LabLua, PUC-Rio by Hisham Muhammad hisham@gobolinux.org and Ana Lúcia de Moura amoura@inf.puc-rio.br.