How To & Examples

How to

How to

Application

Košice

Žilina

Horovod

0.21.0

Singularity

3.7.1

Horovod

If you need to use machine learning model via multiple nodes with GPU one of the options is Horovod.

Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make distributed deep learning fast and easy to use.

Usage

First you need to modify your python code according to instructions at https://github.com/horovod/horovod#usage. Horovod is installed on Košice cluster with OpenMPI support. You can run the code similarly as other MPI applications. Horovod sa automatically tries to run one multithreaded process per compute node allocated.

An example for tensorflow is located in /lustre/home/freeware/EXAMPLE_JOBS/horovod. The run script for slurm is run.sh. It asks for two GPU nodes with k20m graphic card installed and runs one process per node:

#--run.sh--
#!/bin/bash
#SBATCH --partition=short
#SBATCH --job-name=test
#SBATCH --output=out.txt
#SBATCH --error=err.txt
#SBATCH --nodes=2
#SBATCH --ntasks=2
#SBATCH --ntasks-per-node=1
#SBATCH --time=30:00
#SBATCH --constraint=k20m
mpirun python3 test_keras.py

Horovod can also be used with mpi4py or singularity containers. Complete documentation is at https://github.com/horovod/horovod.

_images/singularity.png

Singularity

Singularity is a container platform. A container is a single file and it is image based. Singularity is an opensource project that was created to run complex applications on HPC clusters. It enables users to have full control of their environment. It allows you to create and run containers that package up pieces of software in a way that is portable and reproducible. You can build a container using Singularity on your laptop, and then run it on HPC clusters. Singularity also allows you to leverage the resources of whatever host you are on. This includes HPC interconnects, resource managers, file systems, GPUs and/or accelerators, etc.

A non-privileged user can “swap out” the operating system on the host for one they control. So if the host system is running RHEL6 but your application runs in Ubuntu/RHEL7, you can create an Ubuntu/RHEL7 image, install your applications into that image, copy the image to another host, and run your application on that host in its native Ubuntu/RHEL7 environment.

Singularity containers can be in three different formats:

  • read-only squashfs (default) - best for production

  • writable ext3 (–writable option)

  • writable (ch)root directory (–sandbox option) - best for development

Squashfs and (ch)root directory images can be built from Docker source directly on the cluster, no root privileges are needed. It is strongly recommended to create a native Singularity image to speed up the launch of the container. Singularity is available in Košice cluster.

GPU example

_images/singularity_gpu.png

Tensorflow is commonly used for machine learning projects. The official tensorflow repository on Docker Hub contains NVIDA GPU supporting containers, that will use CUDA for processing.

singularity pull docker://tensorflow/tensorflow:latest-gpu

An example is located in /lustre/home/freeware/EXAMPLE_JOBS/gpu.

Commands that run, or otherwise execute containers (shell, exec) can take an --nv option, which will setup the container’s environment to use an NVIDIA GPU and the basic CUDA libraries to run a CUDA enabled application. You can run CUDA application on compute nodes which have GPUs: comp[47-56]. There is run script test.sh, which is simple tensorflow example. You can run it with command sbatch test.sh.

OpenMPI example

_images/singularity_mpi.png

An openmpi example is located in /lustre/home/freeware/EXAMPLE_JOBS/singularity/openmpi/. In this example mpiexec command is executed on singularity container. Inside container is installed the same version of openmpi. Image was built on local linux machine using recipe file openmpi-test.recipe with root privileges:

sudo singularity build openmpi.simg openmpi-test.recipe

During the process of creating the image openmpi with infiniband support is built from source code and simple test is compiled. You can run the example with command sbatch test.sh.

The interactive shell can be invoked by the singularity shell command. This is useful for development purposes. Use the -w | --writable option to make changes inside the container permanent.

A user home directory is mounted inside the container automatically. If you need access to the /scratch local disk storage for your computation, this must be mounted by the -B | --bind option. A complete documentation can be found at the Singularity webpage

A complete documentation can be found at the https://sylabs.io/docs/.