diff --git a/README.md b/README.md index b5a7950a37bfa34ed95d1dd8d7bf75def7dbdd33..f25aa7bfc73632d53a1e3065927357a932619637 100644 --- a/README.md +++ b/README.md @@ -1,7 +1,6 @@ # SGA -SGA is a daemon for executing and monitoring CSBase jobs. This implementation -is made of the following components: +SGA is a daemon for executing and monitoring CSBase jobs. This implementation is made of the following components: ## sga-daemon @@ -9,105 +8,60 @@ The core SGA daemon. To use it, you will also need a driver. ## sga-driver-posix -A SGA daemon driver for running jobs locally on POSIX operating systems. -Tested on Linux and Cygwin. +A SGA daemon driver for running jobs locally on POSIX operating systems. Tested on Linux and Cygwin. ## sga-driver-pbs -A SGA daemon driver for running jobs on Torque PBS clusters. This driver -is written using `sga-exec`, so it can run either locally (at the cluster -master machine) or manage the cluster remotely over SSH. +A SGA daemon driver for running jobs on Torque PBS clusters. This driver is written using `sga-exec`, so it can run either locally (at the cluster master machine) or manage the cluster remotely over SSH. ## sga-driver-slurm -A SGA daemon driver for running jobs on Slurm clusters. This driver is -written using `sga-exec`, so it can run either locally (at the cluster -master machine) or manage the cluster remotely over SSH. +A SGA daemon driver for running jobs on Slurm clusters. This driver is written using `sga-exec`, so it can run either locally (at the cluster master machine) or manage the cluster remotely over SSH. To enable SSH tunnelling to a Slurm server, see [next section](#sga-exec). ## sga-exec -An extensible library for abstracting local and remote execution of commands, -to be used by SGA drivers. +An extensible library for abstracting local and remote execution of commands, to be used by SGA drivers. -To enable SSH tunnelling in your `sga.exec`-powered SGA driver, add the -following to your `sgad.cfg`: +To enable SSH tunnelling in your `sga.exec`-powered SGA driver, add the following to your `sgad.cfg`: - driver_config = { - exec_driver = "sga.exec.ssh", - exec_config = { - host = "username@hostname", - port = 22, - } - } +```lua +driver_config = { + exec_driver = "sga.exec.ssh", + exec_config = { + host = "username@hostname", + port = 22, + } +} +``` -> If the SSH tunnelling needs authentication, add your id_rsa in your **~/.ssh** directory and fill your id_rsa.pub file content in **username@hostname:.ssh/authorized_keys** file. +If the SSH tunnelling needs authentication, add your `id_rsa` file located in the `~/.ssh` directory and fill your ``id_rsa.pub`` file content in `username@hostname:.ssh/authorized_keys` file. -To use `sga-exec` when writing your own driver, the rule of thumb is to -avoid Lua's standard `io.*` and `os.*` routines. +To use `sga-exec` when writing your own driver, the rule of thumb is to avoid Lua's standard `io.*` and `os.*` routines. ## ssh-datatransfer -A SGA daemon can use a ssh data transfer mechanism to copy input and executable files to execute on remote -host sandbox. The SSH data transfer configuration can be enable with posix driver. Add the following to your `sgad.cfg`: - - driver = "sga.driver.posix" - extra_config = { - csbase_transfer_name = "ssh-datatransfer", - csbase_csfs_root_dir = "/tmp/csfs_sandbox", - ssh_host = "localhost", - ssh_port = 22, - ssh_user_name = "csgrid", - ssh_private_key_path = "/home/csgrid/.ssh/csgrid_id_rsa" - } - -`Note:` -Add **csgrid_id_rsa** private key in /home/csgrid/.ssh CSGrid server directory and fill **csgrid_id_rsa.pub** -file content in SGA csgrid home directory _.ssh/authorized_keys_ file. - -## Multiple SGAs with same instalation directory - -Multiple SGAs can be run from the same installation directory. In this case, we -have a installation directory shared among all SGA machines via NFS. -All SGAs machines share the same script sgad.sh and configuration file sgad.cfg. -To run multiple SGAs in this environment, the following changes can be done: - -sgad.sh -```bash -#!/bin/bash -export CSBASE_SERVER="http://localhost:40409" -export SGAD_HOST=$HOSTNAME -export SGAD_PORT=40100 -export SGAD_NAME=$HOSTNAME -export SGAD_PLATFORM="Linux44_64" -export SERVER_DATA_DIR="/mnt/csgrid_data" - -timestamp=$(date +%Y%m%d%H%M%S) -logsdir="logs" - -[[ ! -e $logsdir ]] && echo "mkdir $logsdir" && mkdir $logsdir - -logfile="logs/${HOSTNAME}_sgad_${timestamp}.log" - -configfile=${1} -if [ -z ${configfile} ]; then - echo "Using default file configuration" > ${logfile}; - configfile=sgad.cfg -fi -hostruntimedir="/tmp/${SGAD_NAME}" -sgadruntimedir="${hostruntimedir}/sgad" -runtimesandboxdir="${sgadruntimedir}/sandbox" - -[[ ! -e $hostruntimedir ]] && echo "mkdir $hostruntimedir..." && mkdir $hostruntimedir -[[ ! -e $sgadruntimedir ]] && echo "mkdir $sgadruntimedir..." && mkdir $sgadruntimedir -[[ ! -e $runtimesandboxdir ]] && echo "mkdir $runtimesandboxdir..." && mkdir $runtimesandboxdir - -eval $(luarocks path --bin) -sgad ${configfile} 2>&1 | tee -a "${logfile}" +A SGA daemon could use a ssh data transfer mechanism to copy input and executable files to execute on remote host sandbox. To enable it adds the following configuration to your `sgad.cfg` file: + +```lua +driver = "sga.driver.posix" +extra_config = { + csbase_transfer_name = "ssh-datatransfer", + csbase_csfs_root_dir = "/tmp/csfs_sandbox", + ssh_host = "localhost", + ssh_port = 22, + ssh_user_name = "csgrid", + ssh_private_key_path = "/home/csgrid/.ssh/csgrid_id_rsa" +} ``` - -sgad.cfg + +The `ssh_private_key_path` value is the private key path in the CSGrid server machine, and the user csgrid must be able to authenticate in the SGA machine using this key. + +## Multiple SGAs with same installation directory + +The SGA configuration file is a Lua script, thus the use standard Lua functions -- like `os.getenv` to read environment variables -- is allowed and could be used to adapt a single configuration file to multiple SGA instances, as seen in the `sgad.cfg` file example below. + ```lua csbase_server = os.getenv("CSBASE_SERVER") platform = os.getenv("SGAD_PLATFORM") or "Linux44_64" @@ -123,16 +77,25 @@ runtime_data_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad" sandbox_root_dir = "/tmp/" .. os.getenv("SGAD_NAME") .. "/sgad/sandbox" driver = "sga.driver.posix" resources = { - "docker" + "docker" } -``` +``` + +With such configuration file, we can execute different SGA instances adapting each configuration using environment variables, like in the example below: + +```shell +env CSBASE_SERVER="http://localhost:40409" \ + SGAD_HOST=$HOSTNAME \ + SGAD_PORT=40100 \ + SGAD_NAME=$HOSTNAME \ + SERVER_DATA_DIR="/mnt/csgrid_data \ + sgad.sh +``` ## Install Requirements: -+ Lua 5.2 -+ Lua 5.2 dev (para Ubuntu liblua5.2-dev) + gcc 4.8.5 + g++ + make @@ -141,129 +104,76 @@ Requirements: + openssl_dev 1.1 + perl + ksh -+ LuaRocks 2.4.2 (or higher) -**Note:** For a Microsoft Windows installation, it is recommended to use [Cygwin](https://www.cygwin.com/) for the dependencies. +For a Microsoft Windows installation, it is recommended to use [Cygwin](https://www.cygwin.com/) for the dependency management. Clone the git repository ```shell -git clone https://git.tecgraf.puc-rio.br/csbase/sgarest-daemon -cd sgarest-daemon +$ git clone https://git.tecgraf.puc-rio.br/csbase/sgarest-daemon +$ cd sgarest-daemon ``` -**Note:** On Microsoft Windows, run the following LuaRocks commands to build dependencies which may fail in Windows: +Run the installation script providing an installation path ```shell -luarocks install xml CC=g++ LD=g++ -luarocks install luaposix LDFLAGS=-no-undefined +$ ./install.sh --posix ~/csgrid/sga ``` -Run the following LuaRocks commands to install the SGA core: - -```shell -luarocks install lua-schema-scm-1.rockspec -luarocks make sga-daemon-scm-1.rockspec -``` - -Additionally, run the following LuaRocks commands to install at least one of the follwing the drivers: - -POSIX - -```shell -luarocks make sga-driver-posix-scm-1.rockspec -``` +Additionally, use the following options of the script to install at least one driver: -PBS (experimental) +- `--posix`: Install the POSIX driver +- `--pbs`: Install the PBS driver +- `--slurm`: Install the Slurm driver -```shell -luarocks make sga-exec-scm-1.rockspec -luarocks make sga-driver-pbs-scm-1.rockspec -``` +## List Version -Slurm (experimental) +The version of SGA can be found in `sga-daemon-\*.rockspec` file. The same version is also found in each component: `sga-driver-posix`, `sga-driver-pbs`, `sga-driver-slurm `and `sga-exec`. One can execute the following command to list the SGA version: ```shell -luarocks make sga-exec-scm-1.rockspec -luarocks make sga-driver-slurm-scm-1.rockspec +$ luarocks list sga-daemon ``` -To install locally for the current user, use the option `--local` in the commands above. - -## Self-Contained Installation +## Release a new version -Install LuaRocks on a particular path with the following options: +To release a new SGA version run the following script passing as parameter the release version (e.g. 1.6.2): ```shell -./configure --force-config --prefix=$SGA_INSTALL_PATH -make install +$ ./release-prepare 1.6.2 ``` +The command will automatic change the version attribute in *.rockspec files and it will create a new tag with the release version. After that, it will change rockspec files setting the version attribute to the development version ("scm"). -## Installation without Internet - -Unpack the [LuaRock dependencies](http://www.tecgraf.puc-rio.br/ftp_pub/csbase/sga-rest/sga-rocks-2020-03-25.tar.gz) and add options `--only-server=$REPO_UNPACKED_PATH` to all LuaRocks `install` and `make` commands. - -## Installation behind a proxy - -Follow instructions on https://github.com/luarocks/luarocks/wiki/LuaRocks-through-a-proxy - -## List Version - -The version of SGA can be found in *sga-daemon-\*.rockspec* file. The same version is also found in each component: sga-driver-posix, sga-driver-pbs, sga-driver-slurm and sga-exec. One can execute the following command to list the SGA version: - -```bash - luarocks list sga-daemon -``` - -## Release New Version -To release a new sga version run the following script passing as parameter the release version (e.g. 1.6.2) and the next snapshot: -```bash - ./release-prepare 1.6.2 1.6.3-SNAPSHOT -``` -The command will automatic change the version attribute in *.rockspec files and it will create a new tag with the release version. After that, it will change rockspec files setting the version attribute to the next SNAPSHOT version. +After the tag creation, the CI/CD pipeline will be automatically launched. The last stage on the pipeline is a manual step to build the SGA's Docker image and must be explicitly executed at the end of the process. ## Docker -### Build +The Docker image only works with the POSIX driver. -```shell -docker build . --network host -t csbase/sgarest-daemon -``` - -To use [sga-driver-slurm](#sga-driver-slurm) in a specific runtime root directory, use the following build command: +### Build ```shell -docker build . -t csbase/sgarest-daemon --build-arg RUNTIME_DIR=/path/to/directory +docker build . --network host -t soma/sga ``` ### Run ```shell -docker run --rm \ --p 40100:40100 \ --v ~/.ssh:/root/.ssh \ --v /home/sgad/logs/sga:/sgad/logs \ --v /home/sgad/projects:/sgad/projects \ --v /home/sgad/algorithms:/sgad/algorithms \ --e CSBASE_SERVER="http://csgrided:40509" \ --e SGAD_HOST="sgad40100" \ --e SGAD_PORT="40100" \ --e SGAD_NAME="40100" \ --e SLURM_HOST="hostname" \ --e SLURM_USER="username" \ --e SLURM_PWD="password" \ ---network host \ ---privileged \ -csbase/sgarest-daemon +docker run -d \ + --rm \ + -p 40100:40100 \ + -v "${WORKING_DIR}/data/logs/sga":/sgad/logs \ + -v "${WORKING_DIR}/data/projects":/sgad/projects \ + -v "${WORKING_DIR}/data/algorithms":/sgad/algorithms \ + -e CSBASE_SERVER="http://csgrided:40500" \ + -e SGAD_HOST="${HOSTNAME}" \ + -e SGAD_PORT="40100" \ + -e SGAD_NAME="tempestade" \ + --network=host \ + --privileged \ + soma/sga ``` -> Arguments `-e SLURM_XXX=XXX` are required only for [sga-driver-slurm](#sga-driver-slurm). - - - ## Credits -This next-generation SGA was designed and implemented at LabLua, PUC-Rio by -Hisham Muhammad and Ana Lúcia de Moura -. +This next-generation SGA was designed and implemented at LabLua, PUC-Rio by Hisham Muhammad and Ana Lúcia de Moura . diff --git a/docs/index.rst b/docs/index.rst index 4f8d76df22ac233ce3929e2ffc679316b462344c..fd7804cb3c80ae82f44147b7760fc6bb955ea457 100644 --- a/docs/index.rst +++ b/docs/index.rst @@ -51,9 +51,13 @@ Requisitos + ksh + LuaRocks 2.4.2 (ou mais recente) + git ++ Docker (opcional) .. attention:: - **Somente para instalações no Microsoft Windows:** use o próprio Cygwin para gerenciar as dependências, com exceção do LuaRocks que deve ser instalado conforme instruções em https://luarocks.org/. + **Somente para instalações no Microsoft Windows:** use o próprio Cygwin para gerenciar as dependências, com exceção do LuaRocks que deve ser instalado conforme instruções em https://luarocks.org/. + +.. important:: + As instruções para instalação/configuração/execução via Docker estão na sessão `Execução via Docker`_. Instalação ---------- @@ -62,18 +66,21 @@ Baixar do repositório o arquivo ``.tgz`` correspondente à versão a ser instal .. code-block:: console - $ wget -c https://git.tecgraf.puc-rio.br/csbase-dev/sgarest-daemon/-/archive/X.Y.X/sgarest-daemon-X.Y.X.tar.gz + $ wget -c https://git.tecgraf.puc-rio.br/csbase-dev/sgarest-daemon/-/archive/X.Y.Z/sgarest-daemon-X.Y.Z.tar.gz + +Substituindo ``X.Y.Z`` pelo número da versão desejada. -Substituindo ``X.Y.X`` pelo número da versão desejada. +.. attention:: + Pode ser necessário definir as variáveis de ambiente ``http_proxy`` e ``https_proxy`` para configurar o use de proxies ao comando ``wget``. Mais informações em `GNU Wget Manual:8.1 Proxies `_ Extrair o conteúdo do arquivo usando o comando ``tar``: .. code-block:: console - $ tar -xzf sgarest-daemon-X.Y.X.tar.gz + $ tar -xzf sgarest-daemon-X.Y.Z.tar.gz .. attention:: - Para instalar somente para o usuário corrente use a opção ``--local`` nos comandos ``luarocks``. + Para instalar somente para o usuário corrente use a opção ``--local`` nos comandos ``luarocks``. .. attention:: **Somente para instalações no Microsoft Windows:** execute os seguintes comandos LuaRocks antes de iniciar a instalação: @@ -105,7 +112,7 @@ PBS $ luarocks make sga-exec-*.rockspec $ luarocks make sga-driver-pbs-*.rockspec -Slurm (experimental) +Slurm .. code-block:: console @@ -129,8 +136,8 @@ Para instalações realizadas em máquina usando proxy para conexão à Internet .. TODO Como desistalar o SGA e o LuaRocks -- e Lua se for o caso - Desinstalação - ^^^^^^^^^^^^^ + Desinstalação + ^^^^^^^^^^^^^ Configuração ^^^^^^^^^^^^ @@ -265,7 +272,7 @@ Exemplo de arquivo de configuração múltipla: platform = "Linux44_64" runtime_data_dir = "/tmp/sgad/runtime-temporal" ]], - [[ + [[ sga_name = "garoa" platform = "Linux64e5" runtime_data_dir = "/tmp/sgad/runtime-garoa" @@ -285,7 +292,7 @@ sga.driver.pbs Driver para gerência de comandos em clusters através do TORQUE PBS. Testado com a versåo `2.5.12 `_. As seguintes configurações opcionais podem ser definidas em ``pbs_config``: init_dir - Diretório de trabalhado para de execução dos comandos. Recomendado usar o valor ``os.getenv("PWD")`` (``qsub -d``) + Diretório de trabalho para de execução dos comandos. Recomendado usar o valor ``os.getenv("PWD")`` (``qsub -d``) queue Fila de execução do job. Um SGA apenas gerencia uma única fila. Para gerenciar múltiplas filas é necessário utilizar múltiplos SGAs (``qsub -q``) allow_proxy_user @@ -306,9 +313,9 @@ sga.driver.slurm Driver para gerência de comandos em clusters através do Slurm. Testado com a versão `20.02 `_. As seguintes configurações opcionais podem ser definidas em ``slurm_config``: init_dir - Diretório de trabalhado para de execução dos comandos. . Recomendado usar o valor ``os.getenv("PWD")`` (``sbatch -D``) + Diretório de trabalho para de execução dos comandos. Recomendado usar o valor ``os.getenv("PWD")`` (``sbatch -D``) queue - Fila de execução do job. b. Um SGA apenas gerencia uma única fila. Para gerenciar múltiplas filas é necessário utilizar múltiplos SGA (``sbatch -p``) + Fila de execução do job. Um SGA apenas gerencia uma única fila. Para gerenciar múltiplas filas é necessário utilizar múltiplos SGA (``sbatch -p``) allow_proxy_user Ativa o uso do ``proxy user``, fazendo que os comandos sejam executados no cluster usando o mesmo usuário que fez a submissão no servidor CSBase. O valor padrão é ``false``. (``sbatch --uid``) proxy_user_group @@ -326,11 +333,12 @@ sga.driver.slurm } } - Ao ativar a opção ``allow_proxy_user`` é preciso adicionar o usuário que executa o SGA ao arquivo ``/etc/sudoers``, pois para usar os argumentos ``--uid`` e ``--gid`` do comando ``sbatch`` esse usuário precisa de permissão ``root``. Além disso não deve-se solicitar sua senha ao executar o comando ``sbatch``. Segue exemplo: + Ao ativar a opção ``allow_proxy_user`` é preciso adicionar o usuário que executa o SGA ao arquivo ``/etc/sudoers``, pois para usar os argumentos ``--uid`` e ``--gid`` do comando ``sbatch`` esse usuário precisa de permissão ``root``. Além disso não deve-se solicitar sua senha ao executar os comandos ``sbatch`` e ``scancel``. Segue exemplo: + .. code-block:: sga_user ALL=(ALL) ALL - sga_user ALL=(ALL) NOPASSWD: /bin/sbatch + sga_user ALL=(ALL) NOPASSWD: /bin/sbatch,/bin/scancel .. caution:: Somente faça alterações no arquivo ``/etc/sudoers`` através do utilitário ``visudo``. @@ -375,7 +383,7 @@ ssh_user_name ssh_private_key_path Caminho, **no servidor CSBase**, para a chave privado do usuário ssh_user_pass - Senha do usuário que executa o SGA (é recomendado o uso de chave privada para fazer a autenticação) + Senha do usuário que executa o SGA (é recomendado o uso de chave privada para fazer a autenticação) Segue exemplo de configuração: @@ -398,8 +406,7 @@ Para iniciar o SGA execute o seguinte comando em um shell/terminal: .. code-block:: console - $ ./sgad sgad.cfg - + $ ./sgad.sh sgad.cfg Já para executar um SGAs que utiliza a configuração múltipla @@ -407,13 +414,120 @@ Já para executar um SGAs que utiliza a configuração múltipla $ ./sgad.sh sgad.cfg --sga_name garoa -Para visualizar as propriedades processadas pelo SGA existe a flag ``--debug``: +Para visualizar as propriedades processadas antes de executar o SGA existe a flag ``--debug``: .. code-block:: console $ ./sgad.sh sgad.cfg --sga_name garoa --debug +Para somente visualizar as propriedades processadas sem executar o SGA existe a flag ``--info``: + +.. code-block:: console + + $ ./sgad.sh sgad.cfg --sga_name garoa --info + +Execução via systemd +-------------------- + +O diretório ``extras`` contém scripts para cadastro do SGA como serviço no systemd. Mais detalhes no arquivo ``extras/README.md``. + +Execução via Docker +------------------- + +A imagem do SGA está no repositório ``repo.tecgraf.puc-rio.br:18089`` e no caminho ``soma/sga``, assim para pegar a versão ``X.Y.Z`` o seguinte comando deve ser usado: + +.. code-block:: console + + $ docker pull repo.tecgraf.puc-rio.br:18089/soma/sga:X.Y.Z + +.. attention:: + Essa imagem somente devem ser usada com o driver POSIX, pois ao utilizar os driver PBS ou Slurm o SGA precisa ter acesso aos comandos destes escalonadores, o que não é possível quando executado via Docker. + +Para a execução via Docker deve-se definir e mapear os seguintes volumes: + +/sgad/projects + Área de projetos +/sgad/algorithms + Repositório de algoritmos +/sgad/logs + Diretório de logs + +As seguintes variáveis de ambiente devem ser definidos no container: + +CSBASE_SERVER + URL de acesso a API REST SGA do servidor CSBase +SGAD_HOST + Nome ou endereço IP da máquina hospedeira +SGAD_PORT + Porta de acesso ao SGA +SGAD_NAME + Nome do SGA +SGAD_PLATFORM + A plataforma do SGA. O valor padrão é Linux44_64 + +.. important:: + A porta configurada dentro do container via a variável de ambiente ``SGAD_PORT`` precisa ser a igual a porta que será exportada pois o SGA envia essa porta ao servidor CSBase, logo ela deve estar acessível na máquina hospedeira. + +E por fim, utilizar as seguintes opções do comando ``docker run``: + +\\-\\-network=host + Usar a configuração de rede da máquina hospedeira +\\-\\-privileged + Executa o container em modo privilegiado -- necessário para criar novos processos +\\-\\-user user_id:group_id + Executar informando o UID e GID do usuário que irá executar o processo do servidor +\\-\\-restart=unless-stopped + Define a política de reinício do container Docker. O valor ``unless-stopped`` indica que o container deve ser reiniciado sempre, excluíndo se ele foi parado antes do daemon Docker ter sido parado. Mais detalhes em `Docker run reference:Restart policies `_ + +Exemplo de script para iniciar um container Docker usando a imagem do SGA: + +.. code-block:: bash + + #!/bin/bash + + REPO=repo.tecgraf.puc-rio.br:18089 + IMAGE=soma/sga + VERSION=X.Y.Z + + WORKING_DIR=$(dirname "$PWD") + CONTAINER_NAME=sga-${VERSION} + + function start() { + docker run -d \ + --name ${CONTAINER_NAME} \ + --rm \ + -p 40100:40100 \ + -v "${WORKING_DIR}/data/logs/sga":/sgad/logs \ + -v "${WORKING_DIR}/data/projects":/sgad/projects \ + -v "${WORKING_DIR}/data/algorithms":/sgad/algorithms \ + -e CSBASE_SERVER="http://${HOSTNAME}:40500" \ + -e SGAD_HOST="${HOSTNAME}" \ + -e SGAD_PORT="40100" \ + -e SGAD_NAME="tempestade" \ + --restart=unless-stopped \ + --network=host \ + --privileged \ + --user "$(id -u "${USER}")":"$(id -g "${USER}")" \ + ${REPO}/${IMAGE}:${VERSION} + } + + function stop() { + docker stop ${CONTAINER_NAME} + } + + case "$1" in + start) + start + ;; + stop) + stop + ;; + *) + echo "Usage: sga.sh {start|stop}" + ;; + esac + .. - TODO Adicionar sessão com instruções para verificação da instalação - Verificando a instalação - ------------------------ + TODO Adicionar sessão com instruções para verificação da instalação + Verificando a instalação + ------------------------ diff --git a/extras/README.md b/extras/README.md new file mode 100644 index 0000000000000000000000000000000000000000..27864665cef8b7ad4dcee1077073c10a8d1b6a4f --- /dev/null +++ b/extras/README.md @@ -0,0 +1,17 @@ +# Scripts extras + +Este diretório contém alguns scripts para ajudar na configuração do SGA para ser executado como serviço via systemd. + ++ services.sh: script com funções auxiliares (https://github.com/reduardo7/bash-service-manager) ++ sgad-service: script com as ações de start, stop, restart, get status, tail log e tail error ++ sgad.service.systemd.example: exemplo de configuração de serviço usando o systemd + +Para fazer a instalação deve-se alterar o ``sgad.service.systemd.example`` -- indicando o caminho de instalação do script ``sgad-service`` e o usuário e/ou grupo associado ao serviço --, copiá-lo e renomeá-lo para ``/etc/systemd/system/sgad.service``, por exemplo. + +O comando ``systemctl`` deve ser usado para iniciar, obter o estado ou parar o serviço, como nos exemplos abaixo: + +``` +$ systemctl start sgad.service +$ systemctl status sgad.service +$ systemctl stop sgad.service +``` \ No newline at end of file diff --git a/extras/services.sh b/extras/services.sh new file mode 100644 index 0000000000000000000000000000000000000000..530b4ca460f615d5c2e6ac71b6f2d23e196724f5 --- /dev/null +++ b/extras/services.sh @@ -0,0 +1,241 @@ +# Bash Service Manager +# Project: https://github.com/reduardo7/bash-service-manager + +# export PID_FILE_PATH="/tmp/my-service.pid" +# export LOG_FILE_PATH="/tmp/my-service.log" +# export LOG_ERROR_FILE_PATH="/tmp/my-service.error.log" + +@e() { + echo "# $*" +} + +@warn() { + @e "Warning: $*" >&2 +} + +@err() { + @e "Error! $*" >&2 + exit 1 +} + +@execService() { + local c="$1" # Command + local w="$2" # Workdir + local action="$3" # Action + local onStart="$4" # On start + local onFinish="$5" # On finish + + ( + [ ! -z "$w" ] && cd "$w" + + if [ ! -z "$onStart" ]; then + ( "$onStart" "$action" ) + exitCode=$? + + if [ $exitCode -gt 0 ] ; then + @warn "Start service fail" + exit $exitCode + fi + fi + + if [ ! -z "$onFinish" ]; then + onServiceFinish() { + local exitCode=$? + "$onFinish" "$action" $exitCode + return $exitCode + } + trap onServiceFinish EXIT + fi + + "$c" "$action" + ) + return $? +} + +@serviceStatus() { + local serviceName="$1" # Service Name + + if [ -f "$PID_FILE_PATH" ] && [ ! -z "$(cat "$PID_FILE_PATH")" ]; then + local p=$(cat "$PID_FILE_PATH") + + if kill -0 $p >/dev/null 2>&1 + then + @e "Serive $serviceName is runnig with PID $p" + return 0 + else + @e "Service $serviceName is not running (process PID $p not exists)" + return 1 + fi + else + @e "Service $serviceName is not running" + return 2 + fi +} + +@serviceStart() { + local serviceName="$1" # Service Name + local c="$2" # Command + local w="$3" # Workdir + local action="$4" # Action + local onStart="$5" # On start + local onFinish="$6" # On finish + + if @serviceStatus "$serviceName" >/dev/null 2>&1 + then + @e "Service ${serviceName} already running with PID $(cat "$PID_FILE_PATH")" + return 0 + fi + + @e "Starting ${serviceName} service..." + touch "$LOG_FILE_PATH" >/dev/null 2>&1 || @err "Can not create $LOG_FILE_PATH file" + touch "$LOG_ERROR_FILE_PATH" >/dev/null 2>&1 || @err "Can not create $LOG_ERROR_FILE_PATH file" + touch "$PID_FILE_PATH" >/dev/null 2>&1 || @err "Can not create $PID_FILE_PATH file" + + ( + ( + @execService "$c" "$w" "$action" "$onStart" "$onFinish" + ) >>"$LOG_FILE_PATH" 2>>"$LOG_ERROR_FILE_PATH" & echo $! >"$PID_FILE_PATH" + ) & + sleep 2 + + @serviceStatus "$serviceName" >/dev/null 2>&1 + return $? +} + +@serviceStop() { + local serviceName="$1" # Service Name + + if [ -f "$PID_FILE_PATH" ] && [ ! -z "$(cat "$PID_FILE_PATH")" ]; then + touch "$PID_FILE_PATH" >/dev/null 2>&1 || @err "Can not touch $PID_FILE_PATH file" + + @e "Stopping ${serviceName}..." + for p in $(cat "$PID_FILE_PATH"); do + pgid=$(ps -o pgid= $p | grep -o "[0-9]*") + if kill -0 -$pgid >/dev/null 2>&1 + then + kill -15 -$pgid + sleep 2 + if kill -0 -$pgid >/dev/null 2>&1 + then + kill -9 -$pgid + sleep 2 + if kill -0 -$pgid >/dev/null 2>&1 + then + @e "Exec: sudo kill -9 -$pgid" + sudo kill -9 -$pgid + sleep 2 + fi + fi + fi + done + + if @serviceStatus "$serviceName" >/dev/null 2>&1 + then + @err "Error stopping Service ${serviceName}! Service already running with PID $(cat "$PID_FILE_PATH")" + fi + + rm -f "$PID_FILE_PATH" || @err "Can not delete $PID_FILE_PATH file" + return 0 + else + @warn "Service $serviceName is not running" + fi +} + +@serviceRestart() { + local serviceName="$1" # Service Name + local c="$2" # Command + local w="$3" # Workdir + local action="$4" # Action + local onStart="$5" # On start + local onFinish="$6" # On finish + + @serviceStop "$serviceName" + @serviceStart "$serviceName" "$c" "$w" "$action" "$onStart" "$onFinish" +} + +@serviceTail() { + local serviceName="$1" # Service Name + local type="$2" + + case "$type" in + log) + tail -f "$LOG_FILE_PATH" + exit 0 + ;; + error) + tail -f "$LOG_ERROR_FILE_PATH" + exit 0 + ;; + all) + tail -f "$LOG_FILE_PATH" "$LOG_ERROR_FILE_PATH" + exit 0 + ;; + *) + @e "Actions: [log|error]" + exit 1 + ;; + esac +} + +@serviceDebug() { + local serviceName="$1" # Service Name + local c="$2" # Command + local w="$3" # Workdir + local action="$4" # Action + local onStart="$5" # On start + local onFinish="$6" # On finish + + @serviceStop "$serviceName" + @e "Debugging ${serviceName}..." + @execService "$c" "$w" "$action" "$onStart" "$onFinish" + exitCode=$? + @e "Finish debugging ${serviceName}" + return $exitCode +} + +# Service menu + +serviceMenu() { + local action="$1" # Action to execute + local serviceName="$2" # Friendly service name + local c="$3" # Command to run + local w="$4" # Working Directory + local onStart="$5" # On start + local onFinish="$6" # On finish + + case "$action" in + start) + @serviceStart "$serviceName" "$c" "$w" "$action" "$onStart" "$onFinish" + ;; + stop) + @serviceStop "$serviceName" + ;; + restart) + @serviceRestart "$serviceName" "$c" "$w" "$action" "$onStart" "$onFinish" + ;; + status) + @serviceStatus "$serviceName" + ;; + run) + ( [ ! -z "$w" ] && cd "$w" + "$c" "$action" + ) + ;; + debug) + @serviceDebug "$serviceName" "$c" "$w" "$action" "$onStart" "$onFinish" + ;; + tail) + @serviceTail "$serviceName" "all" + ;; + tail-log) + @serviceTail "$serviceName" "log" + ;; + tail-error) + @serviceTail "$serviceName" "error" + ;; + *) + @e "Actions: [start|stop|restart|status|run|debug|tail(-[log|error])]" + exit 1 + ;; + esac +} diff --git a/extras/sgad-service b/extras/sgad-service new file mode 100755 index 0000000000000000000000000000000000000000..0c3a55953a64155d66940322450259be450e048c --- /dev/null +++ b/extras/sgad-service @@ -0,0 +1,21 @@ +#!/usr/bin/env bash + +script_dir=$(dirname $0) + +export PID_FILE_PATH="${script_dir}/sgad-service.pid" +export LOG_FILE_PATH="${script_dir}/sgad-service.log" +export LOG_ERROR_FILE_PATH="${script_dir}/sgad-service.error.log" + +# Import or paste "services.sh" +. $script_dir/services.sh + +run-sgad() { + cd $script_dir + ./sgad.sh +} + +action="$1" +serviceName="sgad" +command="run-sgad" + +serviceMenu "$action" "$serviceName" "$command" \ No newline at end of file diff --git a/extras/sgad.service.systemd.example b/extras/sgad.service.systemd.example new file mode 100644 index 0000000000000000000000000000000000000000..891b7c866e8167634819b3000f2495cc98d990b4 --- /dev/null +++ b/extras/sgad.service.systemd.example @@ -0,0 +1,13 @@ +[Unit] +Description = SGA daemon + +[Service] +Type=forking +User=sgad +Group=soma +ExecStart=/sgad/sgad-service.sh start +ExecStop=/sgad/sgad-service.sh stop +ExecReload=/sgad/sgad-service.sh restart + +[Install] +WantedBy=multi-user.target \ No newline at end of file diff --git a/install.sh b/install.sh new file mode 100755 index 0000000000000000000000000000000000000000..c7ecbb9762e51c519123a21ac0c644873f2eaf2d --- /dev/null +++ b/install.sh @@ -0,0 +1,133 @@ +#!/bin/bash + +usage() { + echo "usage: $0 [options] path" + echo "Available options are:" + echo " --posix Install the POSIX driver." + echo " --pbs Install the PBS driver (experimental)." + echo " --slurm Install the Slurm driver (experimental)." +} + +WARNING=no + +warn() { + echo "$0: $1" + if [ "${WARNING}" == "no" ] + then + exit 1 + fi +} + +download() { + tarball="$2.tar.gz" + if [ ! -e "$tarball" ] + then + wget -c "$1/$tarball" + fi + tar -xzvf "$tarball" +} + +ROCK_REPO="http://www.tecgraf.puc-rio.br/ftp_pub/csbase/sga-rest/luarocks" +SGA_HOME="" +SGA_DRIVERS="" +while [ $# -gt 0 ] +do + case "$1" in + --force) + WARNING=yes + shift + ;; + --rocks) + shift + ROCK_REPO="$1" + shift + ;; + --posix|--pbs|--slurm) + SGA_DRIVERS="${SGA_DRIVERS} ${1##--}" + shift + ;; + --*) + warn "unrecognized option '$1'" + usage + exit 1 + ;; + *) + if [ -n "$SGA_HOME" ]; then + warn "invalid argument '$1'" + usage + exit 1 + fi + SGA_HOME="$1" + shift + ;; + esac +done + +if [ -z "${SGA_HOME}" ] +then + SGA_HOME=$(pwd) +elif [ -d ${SGA_HOME} ] +then + warn "directory already exists: '${SGA_HOME}'" +fi + +if [ -z "${SGA_DRIVERS}" ] +then + warn "please select a driver to be installed." +fi + +if [[ ! -x ${SGA_HOME}/bin/lua ]] +then + LUA_VERSION="lua-5.2.4" + download http://www.lua.org/ftp ${LUA_VERSION} + cd ${LUA_VERSION} + make INSTALL_TOP=${SGA_HOME} linux install + cd .. +elif ! grep -q "Lua 5.2" < <(${SGA_HOME}/bin/lua -v) +then + warn "wrong version of Lua" +fi + +ROCK_REPO_CFG="rocks_servers = { '${ROCK_REPO}' }" +if [[ ! -x ${SGA_HOME}/bin/luarocks ]] +then + LUAROCKS_VERSION="luarocks-2.4.2" + download https://luarocks.github.io/luarocks/releases ${LUAROCKS_VERSION} + cd ${LUAROCKS_VERSION} + ./configure --force-config --with-lua=${SGA_HOME} --prefix=${SGA_HOME} + make build + make install + cd .. + echo ${ROCK_REPO_CFG} >> ${SGA_HOME}/etc/luarocks/config-5.2.lua +elif ! grep -q "${ROCK_REPO_CFG}" ${SGA_HOME}/etc/luarocks/config-5.2.lua +then + warn "LuaRocks is not using the rocks repository." +fi + +case "$(uname -s)" in + CYGWIN*) + ${SGA_HOME}/bin/luarocks install lub + ${SGA_HOME}/bin/luarocks install xml CC=g++ LD=g++ + ${SGA_HOME}/bin/luarocks install luaposix LDFLAGS=-no-undefined + ;; +esac + +${SGA_HOME}/bin/luarocks make sga-daemon-scm-1.rockspec + +for driver in ${SGA_DRIVERS} +do + case ${driver} in + posix) + ${SGA_HOME}/bin/luarocks make sga-driver-posix-scm-1.rockspec + ;; + pbs) + ${SGA_HOME}/bin/luarocks make sga-exec-scm-1.rockspec + ${SGA_HOME}/bin/luarocks make sga-driver-pbs-scm-1.rockspec + ;; + slurm) + ${SGA_HOME}/bin/luarocks make sga-exec-scm-1.rockspec + ${SGA_HOME}/bin/luarocks make sga-driver-slurm-scm-1.rockspec + ;; + esac +done + diff --git a/sga-daemon-scm-1.rockspec b/sga-daemon-scm-1.rockspec index aa19176eb539e3bedf1f56881aa232c1eaf526b3..12d5b42c1ff982ba78106cf4811b1b8e2d055b7a 100644 --- a/sga-daemon-scm-1.rockspec +++ b/sga-daemon-scm-1.rockspec @@ -46,7 +46,7 @@ build = { }, install = { bin = { - ["sgad"] = "sgad", + ["sgad"] = "sgad.lua", } } } diff --git a/sga/driver/pbs.lua b/sga/driver/pbs.lua index f79a6dbb5cde30bb6d762d83b655dbb6969d7029..bb592a755a958649c925752f681a538b93606dca 100644 --- a/sga/driver/pbs.lua +++ b/sga/driver/pbs.lua @@ -20,7 +20,7 @@ pbs.type = "cluster" -- @param job The job object: job.data is a writable table for driver data. -- @param cmd_string The command string -- @return True if succeded or nil and an error message -function pbs.execute_command(self, job, cmd_string) +function pbs.execute_command(self, job, cmd_string, user_token) local script_filename = self.config.runtime_data_dir.."/qsub_"..job.jid..".script" local out_filename = self.config.runtime_data_dir.."/qsub_"..job.jid..".out" local err_filename = self.config.runtime_data_dir.."/qsub_"..job.jid..".err" @@ -46,7 +46,12 @@ function pbs.execute_command(self, job, cmd_string) self.exec:chmod(sandbox_path, "rwxrwxrwx") end - self.exec:write_file(script_filename, "#!/bin/sh\numask 002\n"..cmd_string.."\n") + local token_env + if user_token then + token_env = "CSBASE_USER_TOKEN=" .. user_token .. " " + end + + self.exec:write_file(script_filename, "#!/bin/sh\numask 002\n"..token_env..cmd_string.."\n") local full_cmd = ("%s -N %s -V -o %s -e %s %s %s"):format(cmds.qsub, job.cmd_id, out_filename, err_filename, optional_params, script_filename) self.logger:debug("[COMMAND] "..full_cmd) diff --git a/sga/driver/slurm.lua b/sga/driver/slurm.lua index 0a42e92944b285d6842ef2c097c03fbebb372180..0f3cf10590687d6252fbf095ef818c7404046483 100644 --- a/sga/driver/slurm.lua +++ b/sga/driver/slurm.lua @@ -19,6 +19,7 @@ local cmds = { sockets=%X cores=%Y threads=%Z' -h", scancel = "scancel " } +local sudo_cmd = "sudo " --- Type of the SGA, returned to the server during registration. slurm.type = "cluster" @@ -28,61 +29,67 @@ slurm.type = "cluster" -- @param job The job object: job.data is a writable table for driver data. -- @param cmd_string The command string -- @return True if succeded or nil and an error message -function slurm.execute_command(self, job, cmd_string) - local script_filename = self.config.runtime_data_dir.."/sbatch_"..job.jid..".script" - local out_filename = self.config.runtime_data_dir.."/sbatch_"..job.jid..".out" - local err_filename = self.config.runtime_data_dir.."/sbatch_"..job.jid..".err" - local sudo_cmd = "sudo " +function slurm.execute_command(self, job, cmd_string, user_token) + local script_filename = self.config.runtime_data_dir .. "/sbatch_" .. job.jid .. ".script" + local out_filename = self.config.runtime_data_dir .. "/sbatch_" .. job.jid .. ".out" + local err_filename = self.config.runtime_data_dir .. "/sbatch_" .. job.jid .. ".err" local as_root = false for _, sandbox_path in ipairs(job.sandboxes) do local ok, err = self.exec:create_dir(sandbox_path) if not ok then - if not self.exec:is_dir(sandbox_path) then - return nil, "Failed creating job's sandbox "..sandbox_path - end + if not self.exec:is_dir(sandbox_path) then + return nil, "Failed creating job's sandbox " .. sandbox_path + end end self.exec:chmod(sandbox_path, "rwxrwxrwx") end local optional_params = "" if self.slurm_init_dir then - optional_params = optional_params.." -D "..self.slurm_init_dir + optional_params = optional_params .. " -D " .. self.slurm_init_dir end -- O equivalente de queue no slurm é partição! if self.slurm_queue then - optional_params = optional_params.." -p "..self.slurm_queue + optional_params = optional_params .. " -p " .. self.slurm_queue end if self.allow_proxy_user and (job.parameters.slurm_user or job.parameters.csbase_command_user_id) then - local gid_param = " --gid=" + local gid_param = " --gid=" if self.proxy_user_group then - gid_param = gid_param..self.proxy_user_group + gid_param = gid_param .. self.proxy_user_group else - gid_param = gid_param.."$(id -g)" + gid_param = gid_param .. "$(id -g)" end - optional_params = optional_params.." --uid="..(job.parameters.slurm_user or job.parameters.csbase_command_user_id)..gid_param + optional_params = optional_params .. " --uid=" .. + (job.parameters.slurm_user or job.parameters.csbase_command_user_id) .. gid_param as_root = true end + local token_env + if user_token then + token_env = "CSBASE_USER_TOKEN=" .. user_token .. " " + end + -- Add umask at top of the script to ensure that all files created during the -- job's execution could be modified by the unix group members - self.exec:write_file(script_filename, "#!/bin/sh\numask 002\n"..cmd_string.."\n") + self.exec:write_file(script_filename, "#!/bin/sh\numask 002\n" .. token_env .. cmd_string .. "\n") local cmd_prefix = as_root and sudo_cmd or "" - local full_cmd = ("%s%s --parsable -o %s -e %s %s %s"):format(cmd_prefix, cmds.sbatch, out_filename, err_filename, optional_params, script_filename) - self.logger:debug("[COMMAND] "..full_cmd) + local full_cmd = ("%s%s --parsable -o %s -e %s %s %s"):format(cmd_prefix, cmds.sbatch, out_filename, err_filename, + optional_params, script_filename) + self.logger:debug("[COMMAND] " .. full_cmd) local slurmjid, stderr = self.exec:run(full_cmd) - + if slurmjid ~= "" then slurmjid = slurmjid:gsub("\n", "") - self.logger:debug("Submitted Slurm job: "..slurmjid) + self.logger:debug("Submitted Slurm job: " .. slurmjid) job.data.script_filename = script_filename job.data.out_filename = out_filename job.data.err_filename = err_filename job.data.slurmjid = slurmjid return true else - local err = "Failed submitting job: "..stderr + local err = "Failed submitting job: " .. stderr return nil, err end end @@ -94,29 +101,31 @@ function slurm.cleanup_job(self, job) for _, sandbox_path in ipairs(job.sandboxes) do if self.exec:is_dir(sandbox_path) then - local ok, err = self.exec:remove_dir(sandbox_path) - if not ok then - self.logger:error("Failed removing job's sandbox "..sandbox_path) - end + local ok, err = self.exec:remove_dir(sandbox_path) + if not ok then + self.logger:error("Failed removing job's sandbox " .. sandbox_path) + end end end end local function parse(input) - local out = {} - input = input:gsub("\n"," ") - for attr, val in input:gmatch("(%S+)=(%S+)") do - out[attr]=out[attr] or val - end - return out + local out = {} + input = input:gsub("\n", " ") + for attr, val in input:gmatch("(%S+)=(%S+)") do + out[attr] = out[attr] or val + end + return out end local function squeue(self, job) - local jdata, stderr = self.exec:run(cmds.squeue..job.data.slurmjid) + local full_cmd = cmds.squeue .. job.data.slurmjid + self.logger:debug("[COMMAND] " .. full_cmd) + local jdata, stderr = self.exec:run(full_cmd) if not jdata or jdata == "" then - return nil, "Failed running job status command"..(stderr and " - "..stderr) + return nil, "Failed running job status command" .. (stderr and " - " .. stderr) end - local data = parse(jdata) --// Função parse pega o output do squeue e faz o parse do state? + local data = parse(jdata) -- // Função parse pega o output do squeue e faz o parse do state? if not (data) then return nil, "Failed parsing job status data" end @@ -130,12 +139,15 @@ local slurm_to_sga_state = { RUNNING = "RUNNING", PENDING = "WAITING", CONFIGURING = "WAITING", - FAILED = "FINISHED" + FAILED = "FINISHED", + CANCELLED = "FINISHED" } local function get_seconds(time) local h, m, s = time:match("(%d+):(%d+):(%d+)") - if not h then return 0 end + if not h then + return 0 + end return s + m * 60 + h * 3600 end @@ -150,18 +162,29 @@ function slurm.is_command_done(self, job) for k, v in util.collect_exec_data(job) do end return true, donetime, donetime, donetime -- FIXME detailed times + else return false end end -local mem_fact = {b = 1, k = 1024, m = 1024^2, g = 1024^3, t = 1024^4} +local mem_fact = { + b = 1, + k = 1024, + m = 1024 ^ 2, + g = 1024 ^ 3, + t = 1024 ^ 4 +} local function get_mem(minfo) - local m, um = minfo:match("(%d+)(%a+)") - m = m and tonumber(m) or 0 - um = um and um:sub(1, 1):lower() or "b" - local f = mem_fact[um] - if f then return m * f else return m end + local m, um = minfo:match("(%d+)(%a+)") + m = m and tonumber(m) or 0 + um = um and um:sub(1, 1):lower() or "b" + local f = mem_fact[um] + if f then + return m * f + else + return m + end end slurm.actions = { @@ -177,14 +200,20 @@ slurm.actions = { end local state = slurm_to_sga_state[data.JobState] if state ~= "FINISHED" then - self.exec:run(cmds.scancel..job.data.slurmjid) - job.data.killed = true - return true + local full_cmd = cmds.scancel + if self.allow_proxy_user then + full_cmd = sudo_cmd .. full_cmd + end + full_cmd = full_cmd .. job.data.slurmjid + self.logger:debug("[COMMAND] " .. full_cmd) + self.exec:run(full_cmd) + job.data.killed = true + return true else - return false, "Job is not running" + return false, "Job is not running" end - end, - + end, + -- Gets a command current status -- @param job The job object -- @return A table with information for each command component (process) @@ -202,9 +231,9 @@ slurm.actions = { local walltime, mem, vmem, cput if state == "FINISHED" then walltime = get_seconds(data.EndTime) - --mem = get_mem(data.Job.resources_used.mem) - --vmem = get_mem(data.Job.resources_used.vmem) - --cput = get_seconds(data.Job.resources_used.cput) + -- mem = get_mem(data.Job.resources_used.mem) + -- vmem = get_mem(data.Job.resources_used.vmem) + -- cput = get_seconds(data.Job.resources_used.cput) else walltime = get_seconds(data.RunTime) mem, vmem, cput = 0, 0, 0 @@ -227,22 +256,24 @@ slurm.actions = { bytes_in_kb = 0, bytes_out_kb = 0, disk_bytes_read_kb = 0, - disk_bytes_write_kb = 0, + disk_bytes_write_kb = 0 } for k, v in util.collect_exec_data(job) do processes[1][k] = v -- copas.step(1) end - return true, { processes = processes } - end, + return true, { + processes = processes + } + end } local function get_node_attributes(ndata) local attribs = {} for attr, val in ndata:gmatch("(%S+)=(%S+)") do - attribs[attr] = attribs[attr] or val + attribs[attr] = attribs[attr] or val end return attribs @@ -250,7 +281,7 @@ end local function update_node(self, nname, ndata) local data = self.nodes_data[nname] or {} - + -- static information data.clock_mhz = 0 -- FIXME do not return fake information! local attribs = get_node_attributes(ndata) @@ -264,47 +295,49 @@ local function update_node(self, nname, ndata) local totused = availmem - freemem local uram if totused <= availmem then - uram = totused + uram = totused else - uram = availmem + uram = availmem end data.ram_used_perc = uram * 100 / availmem - + local cpuload = attribs.cpuload and tonumber(attribs.cpuload) or 0 data.load_avg_1min_perc = cpuload / np - data.load_avg_5min_perc = data.load_avg_1min_perc -- FIXME fake! + data.load_avg_5min_perc = data.load_avg_1min_perc -- FIXME fake! + data.load_avg_15min_perc = data.load_avg_1min_perc -- FIXME fake! -- FIXME: number of jobs and jobs attribute self.nodes_data[nname] = data end - + function slurm.get_nodes(self) local ninfo, err = self.exec:run(cmds.sinfo) if not ninfo or ninfo == "" then - return nil, ("Failed reading data after %s (%s)"):format(cmds.sinfo, err) + return nil, ("Failed reading data after %s (%s)"):format(cmds.sinfo, err) end - ninfo = "\n"..ninfo -- for uniform matching of node names - ninfo = ninfo:gsub("\n(%S+) ", "\n<%1> ") -- + ninfo = "\n" .. ninfo -- for uniform matching of node names + + ninfo = ninfo:gsub("\n(%S+) ", "\n<%1> ") -- for nname, data in ninfo:gmatch("<(%S+)>([^<]+)") do - nname = nname:gsub("%..*", "") -- remove domain, if exists - update_node(self, nname, data) + nname = nname:gsub("%..*", "") -- remove domain, if exists + update_node(self, nname, data) end -- for simplicity, also returns current status data return self.nodes_data - end +end --- -- Get nodes current status. -- @return A table indexed by node name, each entry contains the node´s available resources ("monitoring" data) function slurm.get_nodes_status(self) for nname, _ in pairs(self.nodes_data) do - local cmd = cmds.sinfo.."-n "..nname - local ninfo = self.exec:run(cmd) - if not ninfo then - self.logger:error("Failed reading data after "..cmd) - else - update_node(self, nname, ninfo) - end + local cmd = cmds.sinfo .. "-n " .. nname + local ninfo = self.exec:run(cmd) + if not ninfo then + self.logger:error("Failed reading data after " .. cmd) + else + update_node(self, nname, ninfo) + end end -- for simplicity, also returns nodes resources return self.nodes_data @@ -333,33 +366,33 @@ end function slurm.new(config, logger) local exec_driver, err = exec.init(config) if err then - return nil, err + return nil, err end - + -- Read optional slurm configuration local slurm_config = (config.driver_config and config.driver_config.slurm_config) or {} if slurm_config then - local slurm_config_schema = schema.Record { - init_dir = schema.Optional(schema.String), - queue = schema.Optional(schema.String), - allow_proxy_user = schema.Optional(schema.Boolean), - proxy_user_group = schema.Optional(schema.String), - } - local err = schema.CheckSchema(slurm_config, slurm_config_schema) - if err then - return nil, "configuration error in slurm_config: "..tostring(err) - end + local slurm_config_schema = schema.Record { + init_dir = schema.Optional(schema.String), + queue = schema.Optional(schema.String), + allow_proxy_user = schema.Optional(schema.Boolean), + proxy_user_group = schema.Optional(schema.String) + } + local err = schema.CheckSchema(slurm_config, slurm_config_schema) + if err then + return nil, "configuration error in slurm_config: " .. tostring(err) + end end - + local self = safer.table { - config = config, - logger = logger, - nodes_data = {}, - exec = exec_driver, - slurm_init_dir = slurm_config.init_dir or nil, - slurm_queue = slurm_config.queue or nil, - allow_proxy_user = slurm_config.allow_proxy_user or false, - proxy_user_group = slurm_config.proxy_user_group or nil, + config = config, + logger = logger, + nodes_data = {}, + exec = exec_driver, + slurm_init_dir = slurm_config.init_dir or nil, + slurm_queue = slurm_config.queue or nil, + allow_proxy_user = slurm_config.allow_proxy_user or false, + proxy_user_group = slurm_config.proxy_user_group or nil } return self end diff --git a/sgad b/sgad deleted file mode 100755 index cdb2aa2d83027b48682678c059d0d16a61fed63d..0000000000000000000000000000000000000000 --- a/sgad +++ /dev/null @@ -1,68 +0,0 @@ -#!/usr/bin/env lua - -local args = { ... } -local safer = require("safer") -local util = require("sga.util") - -module = nil -- needed for luasocket setting a global variable if Lua 5.2 - -safer.globals({}, { module = true, loadstring = true, setfenv = true, getfenv = true }) - - -local configuration = require("sga.configuration") -local application = require("sga.application") - -local function script_path() - local str = debug.getinfo(2, "S").source:sub(2) - return str:match("(.*)/") or "." -end - -local config, message = configuration.read(args[1] or os.getenv("SGAD_CONFIG_FILE") or script_path().."/sgad.cfg", args[4]) -if message ~= '' and message ~= nil then - io.stderr:write(message.."\n") -end -if not config then - os.exit(1) -end - -local debug = type(args[2]) == "boolean" and args[2] or ({ [0]=false, [1]=true })[tonumber(args[2])] -local info = type(args[3]) == "boolean" and args[3] or ({ [0]=false, [1]=true })[tonumber(args[3])] - --- Show config debug/info -if debug or info then - local is_readonly = not next(config) and getmetatable(config) - local t = is_readonly and getmetatable(config).__index or config - if debug then - print("============= DEBUG CONFIG =============") - print(util.debug_table(t)) - print("========================================") - else --info - print(util.debug_table(t)) - os.exit(0) - end -end - -if os.getenv("HOME") then - io.stderr:write("\27]0;sgad\7") -- set terminal title - io.stderr:write("\27[1;32m".."sgad initializing...".."\27[0m\n") -end - -if config.sga_name then - io.stderr:write("\27[1;32m".."sgad starting '"..config.sga_name.."'...".."\27[0m\n") -end -configuration.check(config) - -local app, err = application.new(config) -if not app then - io.stderr:write(err.."\n") - os.exit(1) -end - -local ok, err = app:run() -if ok then - io.stderr:write("\27[1;32m".."sgad terminated gracefully.".."\27[0m\n") - os.exit(0) -else - io.stderr:write(tostring(err).."\n") - os.exit(1) -end diff --git a/sgad.lua b/sgad.lua new file mode 100755 index 0000000000000000000000000000000000000000..dca476dac5b1646be32f92d9ae22395e0ca675a7 --- /dev/null +++ b/sgad.lua @@ -0,0 +1,126 @@ +#!/usr/bin/env lua + +local safer = require("safer") +local util = require("sga.util") + +module = nil -- needed for luasocket setting a global variable if Lua 5.2 + +safer.globals({}, { arg = true, module = true, loadstring = true, setfenv = true, getfenv = true }) + +local function fail(...) + io.stderr:write(arg[0], ": ", ...) + io.stderr:write("\n", [[ +usage: ]], arg[0], [[ [configfile] [options] +Available options are: + --sga_name NAME SGA name to get config from a multiple config file. + --debug Enable debug information: show interpreted configuration. + --info Show debug information and exit. +]]) + os.exit(1, true) +end +local config +local options = { + sga_name = "", + debug = false, + info = false, +} +local toboolean = { + [""] = true, + ["true"] = true, + ["false"] = false, + ["1"] = true, + ["0"] = false, +} +local argc = select("#", ...) +local i = 1 +while i <= argc do + local value = select(i, ...) + local option, optval = value:match("^%-%-([%w_]+)=?(.*)$") + if option == nil then + if config ~= nil then + fail("illegal argument '", value, "'") + end + config = value + else + local ltype = type(options[option]) + if ltype == "nil" then + fail("unrecognized option '", value, "'") + elseif ltype == "boolean" then + optval = toboolean[string.lower(optval)] + if optval == nil then + fail("invalid value for option '--", option, "'") + end + options[option] = optval + elseif ltype == "string" then + if #optval == 0 then + if i == argc then + fail("missing value of option '--", option, "'") + end + i = i+1 + optval = select(i, ...) + end + options[option] = optval + end + end + i = i+1 +end + +if options.sga_name == "" then + options.sga_name = nil +end + +local configuration = require("sga.configuration") +local application = require("sga.application") + +local function script_path() + local str = debug.getinfo(2, "S").source:sub(2) + return str:match("(.*)/") or "." +end + +local config, message = configuration.read(config or os.getenv("SGAD_CONFIG_FILE") or script_path().."/sgad.cfg", options.sga_name) +if message ~= '' and message ~= nil then + io.stderr:write(message.."\n") +end +if not config then + os.exit(1) +end + +-- Show config debug/info +if options.debug or options.info then + local is_readonly = not next(config) and getmetatable(config) + local t = is_readonly and getmetatable(config).__index or config + if options.debug then + print("============= DEBUG CONFIG =============") + print(util.debug_table(t)) + print("========================================") + else --info + print(util.debug_table(t)) + os.exit(0) + end +end + +if os.getenv("HOME") then + io.stderr:write("\27]0;sgad\7") -- set terminal title + io.stderr:write("\27[1;32m".."sgad initializing...".."\27[0m\n") +end + +if config.sga_name then + io.stderr:write("\27[1;32m".."sgad starting '"..config.sga_name.."'...".."\27[0m\n") +end +configuration.check(config) + +local app, err = application.new(config) +if not app then + io.stderr:write(err.."\n") + os.exit(1) +end + +local ok, err = app:run() +if ok then + io.stderr:write("\27[1;32m".."sgad terminated gracefully.".."\27[0m\n") + os.exit(0) +else + io.stderr:write(tostring(err).."\n") + os.exit(1) +end + diff --git a/sgad.sh b/sgad.sh index 889b6fcfd6433de6d67e3485e6331f96f54a22cf..bcf5d52f1ff24bcb2aab8d9d7bee453e50035410 100755 --- a/sgad.sh +++ b/sgad.sh @@ -5,61 +5,8 @@ logsdir="logs" [ ! -e $logsdir ] && echo "mkdir $logsdir" && mkdir $logsdir -logfile="logs/sgad_${timestamp}.log" - -usage() { - echo "usage: $0 [configfile] [options]" - echo "Available options are:" - echo " --sga_name NAME SGA name to get config from a multiple config file." - echo " --debug Enable debug information: show interpreted configuration." - echo " --info Show debug information and exit." -} - -POSITIONAL="" # String para salvar argumentos passados ao comando -debug=0 -info=0 -while [ $# -gt 0 ] # Percorre todos argumentos -do - case "$1" in - --sga_name) - sga_name="$2" - shift - shift - ;; - --sga_name=*) - sga_name="${1#*=}" - shift - ;; - --debug) - debug=1 - shift - ;; - --info) - info=1 - shift - ;; - --*) - echo "$0: unrecognized option '$1'" - usage - exit 1 - ;; - *) - if [ -n "$POSITIONAL" ]; then - echo "$0: invalid argument '$1'" - usage - exit 1 - fi - POSITIONAL="$1" - shift - ;; - esac -done -set -- "${POSITIONAL}" # Reseta a posição do $N que foi shiftada -configfile=$1 # Como shiftamos todos os argumentos para a direita, nosso $1 sempre vai ser o argumento config_file, se ele existir -if [ -z ${configfile} ]; then - echo "Using default file configuration" > ${logfile}; - configfile=sgad.cfg -fi +logfile="${logsdir}/sgad_${timestamp}.log" eval $(luarocks path --bin) -sgad ${configfile} ${debug} ${info} ${sga_name} 2>&1 | tee -a "${logfile}" +sgad $@ 2>&1 | tee -a "${logfile}" +