Installation¶
The framework installation goes as follows.
Preliminaries: Conda installation¶
wget https://repo.anaconda.com/archive/Anaconda3-2024.02-1-Linux-x86_64.sh
Then execute the installer with bash filename.sh
and finally set .condarc
as follows nano .condarc
(home directory)
channel_priority: strict
channels:
- conda-forge
- defaults
Pre-installed CUDA paths (OBSOLETE)¶
source /vols/software/cuda/setup.sh 11.8.0
This can be used with IC machines in principle, however, is not tested.
Automated setup¶
Remark: To avoid No space left on device
problem with conda or pip, set the temporary path first
mkdir <PATH_WITH_SPACE>/tmp
export TMPDIR=<PATH_WITH_SPACE>/tmp
Also after activating the icenet conda environment.
Execute
git clone git@github.com:mieskolainen/icenet.git && cd icenet
# Create the environment
conda env create -f environment.yml
conda activate icenet
# Install dependencies with pip
python -m pip install -r requirements.txt
(OR -r requirements-cpu-only.txt e.g. for Github Actions)
Note: The command python -m pip
should use the pip installed under the conda environment.
Initialize the environment¶
Always start with
conda activate icenet
source setenv.sh
Possible problems¶
Note: One may need to steer where pip
installs the packages, for example
python -m pip install --target $CONDA_PREFIX <package>
Note: If you experience OSError: libcusparse.so.11
(or similar) with torch-geometric, set the system path
export LD_LIBRARY_PATH="$CONDA_PREFIX/lib:$LD_LIBRARY_PATH"
Note: If you experience Could not load dynamic library libcusolver.so.10
with tensorflow, make a symbolic link
ln -s $CONDA_PREFIX/lib/libcusolver.so.11 $CONDA_PREFIX/lib/libcusolver.so.10
Note: $CONDA_PREFIX
will be found after conda activate icenet
Note: If you experience “Requirement already satisfied” infinite loop with pip, try
removing e.g. tensorflow
from requirements.txt, and install it separately with
pip install tensorflow
Then if something else fails, google with the error message.
Show installation paths of binaries¶
which -a pip
which -a python
GPU-support commands¶
Show the graphics card status
nvidia-smi
Show CUDA-compiler tools status
nvcc --version
Show Tensorflow and Pytorch GPU support in Python
import tensorflow
from tensorflow.python.client import device_lib
print(device_lib.list_local_devices())
import torch
torch.cuda.is_available()
print(torch.cuda.get_device_name(0))
HTCondor GPU job submission¶
Use the following command with IC machines
condor_submit <job_description_file>
condor_rm <job_id>
condor_ssh_to_job <job_id> (DEBUG)
condor_q
condor_status --gpus
With an example job description file as
executable = gpu_task.sh
error = gpu.$(CLUSTER).error
output = gpu.$(CLUSTER).output
log = gpu.$(CLUSTER).log
request_gpus = 1
request_memory = 10G
+MaxRuntime = 3600
queue
Where gpu_task.sh
is the actual steering shell script to be run
source setconda.sh
conda activate icenet
python icecool.py
where source setconda.sh
has the Conda (system specific) init commands.
Conda virtual environment commands¶
conda create -y --name icenet python==3.X.Y
conda activate icenet
...[install dependencies with pip, do your work]...
conda deactivate
conda info --envs
conda list --name icenet
# Remove environment completely
conda env remove --name icenet
C-library versions¶
ldd --version