Introduction¶
The library structure is as follows
Basic design principles¶
Core deep learning and I/O functions and classes are designed to be problem generic. That is, they can be used without any specific strict workflow and can handle near arbitrary inputs as suitable (parquet files, ROOT files …).
Many high energy physics applications such as the signal-from-background discrimination problem fit under certain “quasi-templated YAML-python-workflow” as manifested from the implemented applications.
YAML-configuration files¶
End-to-end deep learning applications are configured with YAML-files.
See source files for different applications under /configs
Folder structure¶
Folders starting with a name ice
denote modules, typically
either different core modules such as icenet
or icefit
or physics applications such as icedqcd
, which contain their problem
specific I/O functions.
-analysis Main steering macros and scripts
-checkpoint Trained and saved AI-models
-configs YAML-input configuration
-docs Documentation
-figs Output figures
-icebrem Electron ID application
-icebrk B/R(K) analysis (proto) application [combinatorial classification]
-icedqcd DQCD analysis application [large scale new physics analysis, domain adaptation]
-icefit Core fitting and statistics [tag & probe ++]
-icehgcal HGCAL detector application [graph neural networks]
-icehnl HNL analysis application [neural mutual information with BDT and MLP]
-iceid Electron ID application
-icenet Core deep learning & I/O functions
-iceplot Core plotting tools
-iceqsub SGE submission steering functions
-icetrg HLT trigger application
-icezee High-dimensional reweighting application [advanced MLP models and regularization]
-tests Tests, continuous integration (CI) and bash-launch scripts
-output HDF5, pickle outputs
-dev Development code
The quasi-templated workflow mechanics is implemented in icenet/tools/process.py
.
AI-algorithms and models¶
Various ML and AI-models are implemented and supported. From a fixed dimensional input models
such as boosted decision trees (BDT) via XGBoost enhanced with a custom torch autograd driven loss function,
aka ICEBOOST
, to more complex “Geometric Deep Learning” with graph neural networks using torch-geometric
as a low-level backend.
The library is ultimately agnostic regarding the underlying models, i.e. new torch models or loss functions can be easily added and other computational libraries such as JAX can be used.
For adding new torch models, see source files under /icenet/deep
, especially train.py
, iceboost.py
, optimize.py
and predict.py
.
Reasily available models such as
1. ICEBOOST: Gradient boosted decision trees with a custom autograd loss [xgboost+pytorch]
2. Kolmogorov-Arnold representation theorem networks [pytorch]
3. Lipschitz continuous MLPs [pytorch]
4. Graph Neural Nets (graph-, node-, edge-level inference) [pytorch-geometric]
5. Deep Normalizing Flow (BNAF) based pdfs & likelihood ratios [pytorch]
6. Neural mutual information estimator (MINE) and non-linear distance correlation (DCORR) [pytorch]
7. MaxOUT multilayer feedforward network [pytorch]
8. Permutation Equivariant Networks (DeepSets) [pytorch]
9. CNN-Tensor networks [pytorch]
10. Variational autoencoders [pytorch]
11. Deep MLPs, logistic regression [pytorch]
12. Simple estimators such as factorized (dim-by-dim) pdfs & likelihood ratios using histograms [numpy]
13. ...
Advanced ML-training technology¶
See source files under /icenet/deep
1. Model distillation
2. Conditional (theory) parametric classifiers
3. Inverse CDF based dequantization of a lattice sampled conditional variables
4. Simple and deep domain adaptation (via gradient reversal)
5. Automated hyperparameter tuning (via raytune)
6. Algorithmically [de]correlated (regulated) BDTs and networks with MINE
7. Logit temperature scaling diagnostics and optimization (model output calibration)
8. ...
Automated selectors and combinatorics for distributions¶
The plotting machinery allows sophisticated filtering/cuts or “combinatorial” binning of various metrics, such as ROC-curves and other figures.
See steering-file examples under /configs/*/plots.yml
Sun Grid Engine (SGE) / HTCondor execution¶
DQCD analysis deployment example:
source tests/runme_dqcd_vector_init_yaml.sh
python iceqsub/iceqsub.py --job dqcd_vector_data-D
After inspecting the launch command, launch by adding –run. Make sure you have execute rights (chmod +x) for the steering script under /tests.