Introduction¶

The library structure is as follows

Basic design principles ¶

Core deep learning and I/O functions and classes are designed to be problem generic. That is, they can be used without any specific strict workflow and can handle near arbitrary inputs as suitable (parquet files, ROOT files …).

Many high energy physics applications such as the signal-from-background discrimination or reweighting (or morphing) problems problem fit under certain “quasi-templated YAML-python-workflow” as is manifest from the implemented applications.

YAML-configuration files ¶

End-to-end deep learning applications are configured with YAML-files. See source files for different applications under /configs

Folder structure ¶

Folders starting with a name ice denote modules, typically either different core modules such as icenet or icefit or physics applications such as icedqcd, which contain their problem specific I/O functions.

-analysis     Main steering macros and scripts
-checkpoint   Trained and saved AI-models
-configs      YAML-input configuration
-docs         Documentation
-figs         Output figures
-icebrem      Electron ID application
-icebrk       B/R(K) analysis (proto) application [combinatorial classification]
-icedqcd      DQCD analysis application [large scale new physics analysis, domain adaptation]
-icefit       Core fitting and statistics [tag & probe ++]
-icehgcal     HGCAL detector application [graph neural networks]
-icehnl       HNL analysis application [neural mutual information with ICEBOOST and MLP]
-iceid        Electron ID application
-icemc        Simple MC tools
-icenet       Core deep learning & I/O functions
-iceplot      Core plotting tools
-iceqsub      SGE submission steering functions
-icetrg       HLT trigger application
-icezee       High-dimensional reweighting application
-tests        Tests, continuous integration (CI) and bash-launch scripts
-output       HDF5, pickle outputs
-dev          Development code

The quasi-templated workflow mechanics is implemented in icenet/tools/process.py.

AI-algorithms and models ¶

Various ML and AI-models are implemented and supported. From a fixed dimensional input models such as boosted decision trees (BDT) via XGBoost enhanced with a custom torch autograd driven loss function, aka ICEBOOST, to more complex “Geometric Deep Learning” with graph neural networks using torch-geometric as a low-level backend, and normalizing flows.

The library is ultimately agnostic regarding the underlying models, i.e. new torch models or loss functions can be easily added and other computational libraries such as JAX can be used.

For adding new torch models, see source files under /icenet/deep, especially train.py, iceboost.py, optimize.py and predict.py.

Reasily available models such as

 ICEBOOST: Gradient boosted decision trees with a custom autograd loss [xgboost+pytorch]
 Kolmogorov-Arnold representation theorem networks [pytorch]
 Lipschitz continuous MLPs [pytorch]
 Graph Neural Nets (graph-, node-, edge-level inference) [pytorch-geometric]
 Deep Normalizing Flow (BNAF), Spline Flows based pdfs & likelihood ratios [pytorch]
 Neural mutual information estimator (MINE) and non-linear distance correlation (DCORR) [pytorch]
 MaxOUT multilayer feedforward network [pytorch]
 Permutation Equivariant Networks (DeepSets) [pytorch]
 CNN-Tensor networks [pytorch]
Variational autoencoders [pytorch]
Deep MLPs, logistic regression [pytorch]
Simple estimators such as factorized (dim-by-dim) pdfs & likelihood ratios using histograms [numpy]
...

Advanced ML-training technology ¶

See source files under /icenet/deep

Model distillation
Conditional (theory) parametric classifiers
Inverse CDF based dequantization of a lattice sampled conditional variables
Simple and deep domain adaptation (via gradient reversal)
Automated hyperparameter tuning (via raytune)
Algorithmically [de]correlated (regulated) BDTs and networks with MINE
Logit temperature scaling diagnostics and optimization (model output calibration)
...

Automated selectors and combinatorics for distributions ¶

The plotting machinery allows sophisticated filtering/cuts or “combinatorial” binning of various metrics, such as ROC-curves and other figures. See steering-file examples under /configs/*/plots.yml

Sun Grid Engine (SGE) / HTCondor execution ¶

DQCD analysis deployment example:

source tests/runme_dqcd_vector_init_yaml.sh
python iceqsub/iceqsub.py --job dqcd_vector_data-D

After inspecting the launch command, launch by adding –run. Make sure you have execute rights (chmod +x) for the steering script under /tests.

Introduction¶

Basic design principles¶

YAML-configuration files¶

Folder structure¶

AI-algorithms and models¶

Advanced ML-training technology¶

Automated selectors and combinatorics for distributions¶

Sun Grid Engine (SGE) / HTCondor execution¶