icenet

Core classes including deep learning, data structures, manipulations and visualizations.

icenet.algo

Various (classic) algorithms.

icenet.algo.analytic

count_simple_edges(num_nodes, directed, self_loops)[source]

Count number of edges in a (semi)-fully connected adjacency matrix

deltaR(x, eta1: str, eta2: str, phi1: str, phi2: str)[source]

dR distance (invariant [massless limit y –> eta] under longitudinal boosts)

With awkward arrays

fox_wolfram_boost_inv(p, L=10)[source]

arxiv.org/pdf/1508.03144, (Formula 5.6)

Parameters:
  • p – list of 4-momentum vectors

  • L – maximum angular moment order

Returns:

list of moments of order 0,1,…,L

Return type:

S

[untested function]

get_Lorentz_edge_features(p4vec, num_nodes, num_edges, num_edge_features, directed, self_loops, EPS=1e-12)[source]
get_simple_edge_index(num_nodes, num_edges, directed, self_loops)[source]
gram_matrix(X, type='dot')[source]

Gram matrix for 4-vectors.

Parameters:
  • X – Array (list of N) of 4-vectors

  • type – Type of Lorentz scalars computed (‘dot’, ‘s’, ‘t’)

Returns:

Gram matrix (NxN)

Return type:

G

invmass(x, pt1: str, pt2: str, eta1: str, eta2: str, phi1: str, phi2: str, m1_const=0.1396, m2_const=0.1396)[source]

invariant mass (exact)

With awkward arrays

invmass_massless(x, pt1: str, pt2: str, eta1: str, eta2: str, phi1: str, phi2: str)[source]

invariant mass (massless limit)

With awkward arrays

ktmetric(kt2_i, kt2_j, dR2_ij, p=-1, R=1.0)[source]

kt-algorithm type distance measure.

Parameters:
  • kt2_i – Particle 1 pt squared

  • kt2_j – Particle 2 pt squared

  • delta2_ij – Angular seperation between particles squared (deta**2 + dphi**2)

  • R – Radius parameter

  • 1 (p =) – (p=1) kt-like, (p=0) Cambridge/Aachen, (p=-1) anti-kt like

Returns:

distance measure

phi_phasewrap(phi)[source]

Used for example when phi is deltaphi = phi1 - phi2

icenet.algo.flr

predict(X, b_pdfs, s_pdfs, bin_edges, return_prob=True, EPS=1e-12)[source]

Evaluate the likelihood ratio.

Parameters:
  • X – input data [# vectors x # dimensions]

  • b_pdfs – background pdfs

  • s_pdfs – signal pdfs

  • bin_edges – bin edges

  • return_prob – return probability if True, or likelihood ratio

Returns:

likelihood ratio, or probability

Return type:

LR

train(X, y, weights, param)[source]

Factorized likelihood classifier training.

Parameters:
  • X – input data [# vectors x # dimensions]

  • y – target data

  • weights – weighted events

  • param – dictionary for the parameters

Returns:

background pdfs s_pdfs: signal pdfs bin_edges: histogram bin edges

Return type:

b_pdfs

icenet.algo.nmf

ML_nmf(V, k, threshold=1e-08, maxiter=500)[source]

Non-negative matrix factorization main function.

Parameters:
  • V – (d x n) array (dimension x samples)

  • k – number of components

  • threshold – relative error threshold (Frob norm)

  • maxiter – maximum number of iterations

Returns:

(d x k) array of basis elements H: (k x n) array of weights for each observations

Return type:

W

ML_update_H(V, W, H)[source]

Multiplicative (EM-type) non-negative matrix factorization update for the expansion weights.

Parameters:
  • V – (d x n) (dimension x samples)

  • W – (d x k) (dimension x dictionary size)

  • H – (k x n) (expansion weights for each sample)

Returns:

(k x n) array of updated weights for each sample

Return type:

H

ML_update_W(V, W, H)[source]

Multiplicative (EM-type) non-negative matrix factorization update for basis components.

Parameters:
  • V – (d x n) (dimension x samples)

  • W – (d x k) (dimension x dictionary size)

  • H – (k x n) (expansion weights for each sample)

Returns:

(d x k) updated non-negative basis compoments

Return type:

W

icenet.deep

Deep learning model classes.

icenet.deep.autogradxgb

class XgboostObjective(loss_func: Callable[[Tensor, Tensor], Tensor], mode: str = 'train', flatten_grad: bool = False, hessian_mode: str = 'constant', hessian_const: float = 1.0, hessian_gamma: float = 0.9, hessian_slices: int = 10, device: device = 'cpu')[source]
Parameters:
  • loss_func – Loss function handle

  • mode – ‘train’ or ‘eval’

  • flatten_grad – For vector valued model output [experimental]

  • hessian_mode – ‘constant’, ‘squared_approx’, ‘iterative’, ‘hutchinson’, ‘exact’

  • hessian_const – Scalar parameter ‘constant ‘hessian_mode’

  • hessian_gamma – Hessian momentum smoothing parameter for ‘iterative’ mode

  • hessian_slices – Hutchinson Hessian diagonal estimator MC slice sample size

  • device – Torch device

derivatives(loss: Tensor, preds: Tensor) Tuple[Tensor, Tensor][source]

Gradient and Hessian diagonal

Parameters:
  • loss – loss function values

  • preds – model predictions

Returns:

gradient vector, hessian diagonal vector

iterative_hessian_update(grad: Tensor, preds: Tensor, absMax: float = 10, EPS: float = 1e-08)[source]

Iterative Hessian (diagonal) approximation update using finite differences

[experimental]

Parameters:
  • grad – Current gradient vector

  • preds – Current prediction vector

torch_conversion(preds: ndarray, targets: DMatrix)[source]

Conversion from xgboost.Dmatrix object

icenet.deep.bnaf

class BNAF(*args, res: str | None = None)[source]

Class that extends torch.nn.Sequential for constructing a Block Neural Normalizing Flow.

forward(inputs: Tensor)[source]
Parameters:
  • torch.Tensor

  • tensor. (required. The input)

Returns:

The output tensor and the log-det-Jacobian of this transformation.

class MaskedWeight(in_features: int, out_features: int, dim: int, bias: bool = True)[source]

Module that implements a linear layer with block matrices with positive diagonal blocks. Moreover, it uses Weight Normalization (https://arxiv.org/abs/1602.07868) for stability.

forward(inputs, grad: Tensor | None = None)[source]
Parameters:
  • inputstorch.Tensor, required. The input tensor.

  • gradtorch.Tensor, optional.

  • transformations. (The log diagonal block of the partial Jacobian of previous)

Returns:

The output tensor and the log diagonal blocks of the partial log-Jacobian of previous transformations combined with this transformation.

get_weights()[source]

Computes the weight matrix using masks and weight normalization. It also compute the log diagonal blocks of it.

class Permutation(in_features: int, p: list | None = None)[source]

Module that outputs a permutation of its input.

forward(inputs: Tensor)[source]
Parameters:

inputstorch.Tensor, required. The input tensor.

Returns:

The permuted tensor and the log-det-Jacobian of this permutation.

class Sequential(*args: Module)[source]
class Sequential(arg: OrderedDict[str, Module])

Class that extends torch.nn.Sequential for computing the output of the function together with the log-det-Jacobian of such transformation.

forward(inputs: Tensor)[source]
Parameters:
  • torch.Tensor

  • tensor. (required. The input)

Returns:

The output tensor and the log-det-Jacobian of this transformation.

class Tanh(*args, **kwargs)[source]

Class that extends torch.nn.Tanh additionally computing the log diagonal blocks of the Jacobian.

forward(inputs, grad: Tensor | None = None)[source]
Parameters:
  • inputstorch.Tensor, required. The input tensor.

  • gradtorch.Tensor, optional.

  • transformations. (The log diagonal blocks of the partial Jacobian of previous)

Returns:

The output tensor and the log diagonal blocks of the partial log-Jacobian of previous transformations combined with this transformation.

icenet.deep.cnn

class CNN(C, out_dim=None, nchannels=1, nrows=32, ncols=32, dropout_cnn=0.0, dropout_mlp=0.5, mlp_dim=128)[source]
binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

class CNN_MAXO(D, C, out_dim=None, nchannels=1, nrows=32, ncols=32, dropout_cnn=0.0, mlp_dim=50, num_units=6, dropout_mlp=0.1)[source]

Dual (simultaneous) input network [image tensors x global vectors]

forward(data)[source]
Input data dictionary with members

‘x’ : as image tensor ‘u’’ : global feature tensor

or a class with data.x and data.u

maxout(x, layer_list)[source]

MAXOUT layer

softpredict(x)[source]

icenet.deep.da

class GradientReversal(alpha=1.0)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class GradientReversalFunction(*args, **kwargs)[source]

Unsupervised Domain Adaptation by Backpropagation https://arxiv.org/abs/1409.7495

Notes: The forward pass is an identity map. In the backprogation, the gradients are reversed by grad -> -alpha * grad.

Example

net = nn.Sequential(nn.Linear(10, 10), GradientReversal(alpha=1.0))

static backward(ctx, grads)[source]

Define a formula for differentiating the operation with backward mode automatic differentiation.

This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the vjp function.)

It must accept a context ctx as the first argument, followed by as many outputs as the forward() returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs to forward(). Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.

The context can be used to retrieve tensors saved during the forward pass. It also has an attribute ctx.needs_input_grad as a tuple of booleans representing whether each input needs gradient. E.g., backward() will have ctx.needs_input_grad[0] = True if the first input to forward() needs gradient computed w.r.t. the output.

static forward(ctx, x, alpha)[source]

Define the forward of the custom autograd Function.

This function is to be overridden by all subclasses. There are two ways to define forward:

Usage 1 (Combined forward and ctx):

@staticmethod
def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any:
    pass
  • It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).

  • See combining-forward-context for more details

Usage 2 (Separate forward and ctx):

@staticmethod
def forward(*args: Any, **kwargs: Any) -> Any:
    pass

@staticmethod
def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None:
    pass
  • The forward no longer accepts a ctx argument.

  • Instead, you must also override the torch.autograd.Function.setup_context() staticmethod to handle setting up the ctx object. output is the output of the forward, inputs are a Tuple of inputs to the forward.

  • See extending-autograd for more details

The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with ctx.save_for_backward() if they are intended to be used in backward (equivalently, vjp) or ctx.save_for_forward() if they are intended to be used for in jvp.

icenet.deep.dbnf

class Dataset(X, W)[source]
compute_log_p_x(model, x)[source]

Model log-density value log[pdf(x), where x is the data vector]

log p(x) = log p(z) + sum_{k=1}^K log|det J_{f_k}|

Parameters:
  • model – model object

  • x – N minibatch vectors

Returns:

log-likelihood value

create_model(param, verbose=False, rngseed=0)[source]

Construct the network object.

Parameters:

param – parameters

Returns:

model object

Return type:

model

get_pdf(model, x)[source]

Evaluate learned density (pdf) at point x

Parameters:
  • model – model object

  • x – input vector(s)

Returns:

pdf value

Examples

> x = torch.tensor([[1.0, 2.0]]) > l = get_pdf(model,x)

load_models(param, modelnames, modeldir, device)[source]

Load models from files

predict(X, models, return_prob=True, EPS=1e-09)[source]

2-class density ratio pdf(x,S) / pdf(x,B) for each vector x.

Parameters:
  • param – input parameters

  • X – pytorch tensor of vectors

  • models – list of model objects

  • return_prob – return pdf(S) / (pdf(S) + pdf(B)), else pdf(S) / pdf(B)

Returns:

likelihood ratio (or alternatively probability)

train(model, optimizer, scheduler, trn_x, val_x, trn_weights, val_weights, param, modeldir, save_name)[source]

Train the model density.

Parameters:
  • model – initialized model object

  • optimizer – optimizer object

  • scheduler – optimization scheduler

  • trn_x – training vectors

  • val_x – validation vectors

  • trn_weights – training weights

  • val_weights – validation weights

  • param – parameters

  • modeldir – directory to save the model

icenet.deep.deeptools

class Multiply(alpha)[source]

Multiplication with a non-learnable constant alpha

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

adaptive_gradient_clipping_(main_module: Module, MI_module: Module, EPS=1e-09)[source]

Adaptively clip the gradient from the mutual information module, so that its Frobenius norm is at most that of the gradient from the main network.

See: https://arxiv.org/abs/1801.04062 (Appendix)

Parameters:
  • generator_module – Generator/classifier/regressor… main task network (nn.Module)

  • mi_module – MI regulator network (nn.Module)

grad_norm(module: Module)[source]

Compute the total (Frobenius) norm for the gradients of a torch network

Parameters:

module – torch network

Returns:

total gradient norm

set_scheduler(optimizer: dict, param: dict)[source]
Parameters:
  • optimizer – optimizers for different models

  • param – setup parameters

Returns:

torch scheduler

sigmoid_schedule(t, N_max=1, start=0, end=3, tau=0.7, clip_min=1e-09)[source]

https://arxiv.org/abs/2305.18900

weights_init_all(model, init_funcs)[source]

Examples

model = MyNet() weights_init_all(model, init_funcs)

weights_init_normal(m)[source]

Initializes module weights from normal distribution with a rule sigma ~ 1/sqrt(n)

weights_init_uniform_rule(m)[source]

Initializes module weights from uniform [-a,a]

icenet.deep.deps

class DEPS(D, C, z_dim, out_dim=None, phi_layers=3, rho_layers=3, pool='max', dropout=0.1, **kwargs)[source]

Permutation equivariant networks.

binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

class PEN1_max(in_dim, out_dim)[source]

Permutation Equivariant Network (PEN) max-type layers.

Single dimensional model.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class PEN1_mean(in_dim, out_dim)[source]

Permutation Equivariant Network (PEN) mean-type layers.

Single dimensional model.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class PEN_max(in_dim, out_dim)[source]

Permutation Equivariant Network (PEN) max-type layers.

Multidimensional model.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class PEN_mean(in_dim, out_dim)[source]

Permutation Equivariant Network (PEN) mean-type layers.

Multidimensional model.

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

icenet.deep.dmlp

class DMLP(D, C, out_dim=None, mlp_dim=[128, 64], activation='relu', layer_norm=False, batch_norm=False, dropout=0.0, skip_connections=False, act_after_norm=True, last_tanh=False, last_tanh_scale=10.0, **kwargs)[source]
binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

class LinearLayer(dim_in, dim_out, skip_connections=False, activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, act_after_norm=True)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

MLP(layers: List[int], activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, last_act: bool = False, skip_connections=False, act_after_norm=True)[source]

Return a Multi Layer Perceptron with an arbitrary number of layers.

Parameters:
  • layers – input structure, such as [128, 64, 64] for a 3-layer network.

  • activation – activation function

  • layer_norm – layer normalization

  • batch_norm – batch normalization

  • dropout – dropout regularization

  • skip_connections – skip connections active

  • last_act – apply activation function after the last layer

  • act_after_norm – activation function application order

Returns:

nn.sequential object

MLP_ALL_ACT(layers: List[int], activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, skip_connections=False, act_after_norm: bool = True)[source]

Return a Multi Layer Perceptron with an arbitrary number of layers.

All layers with the activation + other operations applied.

get_act(act: str = 'relu')[source]

Returns torch activation function

Parameters:

act – activation function type

icenet.deep.fastkan

class AttentionWithFastKANTransform(q_dim: int, k_dim: int, v_dim: int, head_dim: int, num_heads: int, gating: bool = True)[source]
forward(q: Tensor, k: Tensor, v: Tensor, bias: Tensor | None = None) Tensor[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class FastKAN(D, C, mlp_dim: ~typing.List[int], grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, use_base_update: bool = False, base_activation=<function silu>, spline_weight_init_scale: float = 0.1, out_dim=None, last_tanh=False, last_tanh_scale=10.0, **kwargs)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

class FastKANLayer(input_dim: int, output_dim: int, grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, use_base_update: bool = False, base_activation=<function silu>, spline_weight_init_scale: float = 0.1)[source]
forward(x, time_benchmark=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class RadialBasisFunction(grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, denominator: float | None = None)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class SplineLinear(in_features: int, out_features: int, init_scale: float = 0.1, **kw)[source]
reset_parameters() None[source]

icenet.deep.gcnn

class GCN(D, Z, C, out_dim=None, dropout=0.5)[source]

Graph Convolution Network

binarypredict(x)[source]

Return maximum probability class

forward(x, adj_matrix)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

class GCN_layer(D_in, D_out, bias=True)[source]

Graph Convolution Network Layer

forward(x, adj_matrix)[source]

Forward operator

reset_param()[source]

1 / sqrt(N) normalization

icenet.deep.graph

class GNNGeneric(d_dim, out_dim, u_dim=0, e_dim=None, z_dim=96, C=None, conv_type='EdgeConv', task='node', global_pool='mean', conv_MLP_act='relu', conv_MLP_bn=True, conv_MLP_dropout=0.0, conv_aggr='max', conv_knn=8, fusion_MLP_act='relu', fusion_MLP_bn=False, fusion_MLP_dropout=0.0, final_MLP_act='relu', final_MLP_bn=False, final_MLP_dropout=0.0, DA_active=False, DA_alpha=1.0, DA_MLP=[128, 64], DA_MLP_act='relu', DA_MLP_bn=False, DA_MLP_dropout=0.0, **kwargs)[source]

Technical Remarks:

Remember always to use MLP_ALL_ACT in the intermediate blocks, i.e. MLPs with an activation function also after the last layer. (otherwise very bad performance may happen for certain message passing / convolution operators).

DynamicEdgeConv_(data)[source]
EdgeConv_(data)[source]
GATConv_(data)[source]
GINEConv_(data)[source]
GINE_helper()[source]

GINEConv requires node features and edge features with the same dimension. Increase dimensionality here.

NNConv_(data)[source]
PANConv_(data)[source]
SAGEConv_(data)[source]
SGConv_(data)[source]
SplineConv_(data)[source]
SuperEdgeConv_(data)[source]
binarypredict(x)[source]

Return maximum probability class

forward(data, conv_only=False)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

forward_2pt(z, edge_index)[source]

MLP decoder of two-point correlations (edges)

Because this function is not (necessarily) permutation symmetric between edge_index[0] and [1], we can learn (in principle) a directed or undirected edge (adjacency) behavior.

forward_with_DA(data)[source]

Forward call with Domain Adaptation

inference(x, data)[source]

Final inference network forward call

softpredict(x)[source]

Softmax probability

class SuperEdgeConv(mlp_edge: Callable, mlp_latent: Callable, aggr: str = 'mean', mp_attn_dim: int = 0, use_residual=True, **kwargs)[source]

Custom GNN convolution operator aka ‘generalized EdgeConv’ (original EdgeConv: arxiv.org/abs/1801.07829)

forward(x: Tensor | Tuple[Tensor, Tensor], edge_index: Tensor | SparseTensor, edge_attr: Tensor | None = None, edge_weight: Tensor | None = None, size: Tuple[int, int] | None = None) Tensor[source]

Runs the forward pass of the module.

init_(module)[source]
message(x_i: Tensor, x_j: Tensor, edge_attr: Tensor | None, edge_weight: Tensor | None) Tensor[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

reset_parameters()[source]

Resets all learnable parameters of the module.

icenet.deep.iceboost

BCE_loss_with_logits(input: Tensor, target: Tensor, weights: Tensor | None = None, epsilon=None)[source]

Numerically stable BCE loss with logits https://medium.com/@sahilcarterr/why-nn-bcewithlogitsloss-numerically-stable-6a04f3052967

create_filters(param, data_trn, data_val)[source]
train_xgb(config={'params': {}}, data_trn=None, data_val=None, y_soft=None, args=None, param=None, plot_importance=True, data_trn_MI=None, data_val_MI=None)[source]

Train XGBoost model

Parameters:

train.py (See other train_* under)

Returns:

trained model

icenet.deep.losstools

BCE_loss(logits, y, weights=None)[source]

Binary Cross Entropy loss

class FocalWithLogitsLoss(weight=None, gamma=2, reduction='mean')[source]

Focal Loss with logits as input

https://arxiv.org/abs/1708.02002

forward(predicted, target)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

LOGIT_L1_loss(logits, logit_beta=1.0, weights=None)[source]

Logit magnitude L1-regularization sum |z|

LOGIT_L2_loss(logits, logit_beta=1.0, weights=None)[source]

Logit magnitude L2-regularization sum |z|^2

class LqBernoulliWithLogitsLoss(weight=None, q=1.0, reduction='mean')[source]

L_q likelihood for the Bernoulli case

https://arxiv.org/pdf/1002.4533

L_q(x)[source]

when q -> 1, then L_q -> log(x)

forward(predicted, target)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

Lq_binary_loss(logits, y, q, weights=None)[source]

L_q Bernoulli loss

MAE_loss(y_hat, y, weights=None)[source]

Mean absolute error loss

MI_loss(X, Z, weights, MI, y)[source]

Neural Mutual Information regularization

MSE_loss(y_hat, y, weights=None)[source]

Mean squared error loss

SWD_reweight_loss(logits, x, y, weights=None, p=1, num_slices=1000, norm_weights=True, mode='SWD')[source]

# Sliced Wasserstein reweight U (y==0) -> V (y==1) transport

binary_cross_entropy_logprob(log_phat_0, log_phat_1, y, weights=None)[source]

Per instance weighted binary cross entropy loss (y can be a scalar between [0,1]) (negative log-likelihood)

binary_focal_loss(logits, y, gamma=1.0, weights=None)[source]

Focal Cross Entropy loss

log_softmax(x, dim=-1)[source]

Log of Softmax

Parameters:

x – network output without softmax

Returns:

logsoftmax values

logsumexp(x, dim=-1)[source]

https://en.wikipedia.org/wiki/LogSumExp

loss_wrapper(model, x, y, num_classes, weights, param, y_DA=None, w_DA=None, MI=None, EPS=1e-12)[source]

A wrapper function to loss functions

Note

log-likelihood functions can be weighted linearly, due to prod_i p_i(x_i; theta)**w_i ==log==> sum_i w_i log p_i(x_i; theta)

multiclass_cross_entropy_logprob(log_phat, y, num_classes, weights=None)[source]

Per instance weighted cross entropy loss (negative log-likelihood)

multiclass_focal_entropy_logprob(log_phat, y, num_classes, gamma, weights=None)[source]

Per instance weighted ‘focal entropy loss’ https://arxiv.org/pdf/1708.02002.pdf

multiclass_logit_norm_loss(logit, y, num_classes, weights=None, t=1.0, EPS=1e-07)[source]

https://arxiv.org/abs/2205.09310

icenet.deep.lzmlp

class LZMLP(D, C, out_dim=None, mlp_dim=[128, 64], activation='relu', layer_norm=False, batch_norm=False, dropout=0.0, last_tanh=False, last_tanh_scale=10.0, act_after_norm=True, **kwargs)[source]
binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_lipschitz_loss()[source]

Lipschitz regularization loss

softpredict(x)[source]

Softmax probability

class LipschitzLinear(in_features, out_features)[source]

Lipschitz linear layer

forward(input)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

get_lipschitz_constant()[source]
initialize_parameters()[source]

icenet.deep.maxo

class MAXOUT(D, C, num_units, neurons, dropout, out_dim=None, **kwargs)[source]

MAXOUT network

binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

maxout(x, layer_list)[source]

MAXOUT layer

softpredict(x)[source]

Softmax probability

icenet.deep.mlgr

class MLGR(D, C, out_dim=None)[source]

Multinomial Logistic Regression model

binarypredict(x)[source]

Return maximum probability class

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

softpredict(x)[source]

Softmax probability

icenet.deep.optimize

class Dataset(X, Y, W, Y_DA=None, W_DA=None, X_MI=None)[source]
class DualDataset(X, U, Y, W, Y_DA=None, W_DA=None, X_MI=None)[source]
batch2tensor(batch, device)[source]

Transform batch objects to a correct device

dict_batch_to_cuda(batch, device)[source]

Transfer to (GPU) device memory

model_to_cuda(model, device_type: str = 'auto')[source]

Wrapper function to handle CPU/GPU setup

printloss(loss, precision=5)[source]

Loss torch string printer

process_batch(batch, x, y, w, y_DA=None, w_DA=None, MI=None, DA_active=False)[source]
test(model, loader, device, opt_param: dict, MI: dict | None = None, compute_loss: bool = False)[source]

Pytorch based testing routine.

Parameters:
  • model – pytorch model

  • loader – pytorch dataloader

  • device – ‘cpu’ or ‘device’

  • opt_param – optimization parameters

  • MI – MI parameters

  • compute_loss – compute the loss

Returns

loss dictionary, accuracy, AUC

trackloss(loss, loss_history)[source]

Track individual loss terms

train(model, loader, optimizer, device, opt_param: dict, MI: dict | None = None)[source]

Pytorch based training routine.

Parameters:
  • model – pytorch model

  • loader – pytorch dataloader

  • optimizer – pytorch optimizer

  • device – ‘cpu’ or ‘device’

  • opt_param – optimization parameters

  • MI – MI parameters

Returns

trained model (return implicit via input arguments)

icenet.deep.pgraph

class PANConv(in_channels, out_channels, filter_size=4, panconv_filter_weight=None)[source]
forward(x, edge_index, num_nodes=None, edge_mask_list=None)[source]

Runs the forward pass of the module.

message(x_j, norm)[source]

Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in edge_index. This function can take any argument as input which was initially passed to propagate(). Furthermore, tensors passed to propagate() can be mapped to the respective nodes \(i\) and \(j\) by appending _i or _j to the variable name, .e.g. x_i and x_j.

panentropy(edge_index, num_nodes)[source]
panentropy_sparse(edge_index, num_nodes, AFTERDROP, edge_mask_list)[source]
update(aggr_out)[source]

Updates node embeddings in analogy to \(\gamma_{\mathbf{\Theta}}\) for each node \(i \in \mathcal{V}\). Takes in the output of aggregation as first argument and any argument which was initially passed to propagate().

class PANDropout(filter_size=4)[source]
forward(edge_index, p=0.5)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class PANUMPooling(in_channels, ratio=0.5, min_score=None, multiplier=1, nonlinearity=<built-in method tanh of type object>)[source]

Specific Graph pooling layer based on unnormalized M from PAN, which can only work after PANConv.

filter_adj(edge_index, edge_weight, perm, num_nodes=None)[source]
forward(x, edge_index, edge_weight=None, M=None, UM=None, batch=None, num_nodes=None)[source]
topk(x, ratio, batch, min_score=None, tol=1e-07)[source]
class PANXHMPooling(in_channels, ratio=0.5, pan_pool_weight=None, min_score=None, multiplier=1, nonlinearity=<built-in method tanh of type object>, filter_size=3, panpool_filter_weight=None)[source]

General Graph pooling layer based on PAN, which can work with all layers.

filter_adj(edge_index, edge_weight, perm, num_nodes=None)[source]
forward(x, edge_index, edge_weight=None, M=None, UM=None, batch=None, num_nodes=None)[source]
panentropy_sparse(edge_index, num_nodes)[source]
topk(x, ratio, batch, min_score=None, tol=1e-07)[source]
class PANXUMPooling(in_channels, ratio=0.5, pan_pool_weight=None, min_score=None, multiplier=1, nonlinearity=<built-in method tanh of type object>, filter_size=3, panpool_filter_weight=None)[source]

General Graph pooling layer based on PAN, which can work with all layers.

filter_adj(edge_index, edge_weight, perm, num_nodes=None)[source]
forward(x, edge_index, edge_weight=None, M=None, UM=None, batch=None, num_nodes=None)[source]
panentropy_sparse(edge_index, num_nodes)[source]
topk(x, ratio, batch, min_score=None, tol=1e-07)[source]

icenet.deep.predict

pred_cut(ids, param)[source]
pred_cutset(ids, param)[source]
pred_flow(args, param, n_dims, return_model=False)[source]
pred_flr(args, param)[source]
pred_graph_xgb(args, param)[source]
pred_torch_generic(args, param, return_model=False)[source]
pred_torch_graph(args, param, batch_size=5000, return_model=False)[source]
pred_torch_scalar(args, param, return_model=False)[source]
pred_xgb(args, param, feature_names=None, return_model=False)[source]
pred_xgb_logistic(args, param, feature_names=None, return_model=False)[source]

Same as pred_xgb_scalar but a sigmoid function applied

pred_xgb_scalar(args, param, feature_names=None, return_model=False)[source]

icenet.deep.tempscale

class LogitsWithTemperature(mode='softmax', device='cpu')[source]

“Temperate calibration” wrapper class.

Use with original raw logits and class labels as an input.

calibrate(logits: Tensor, labels: Tensor, weights: Tensor | None = None, lr: float = 0.01, max_iter: int = 50)[source]

Tune the temperature of the model with NLL loss (using the validation set)

Parameters:
  • logits – model output logits per event (single or softmax type)

  • labels – class label per event (torch.float32)

  • weights – weights per event

temperature_scale(logits)[source]

Temperature scaling on logits

class ModelWithTemperature(model, mode='softmax', device='cpu')[source]

“Temperate calibration” wrapper class.

Output of the original network needs to be in logits, not softmax or log softmax.

Expects model(input) to return logits.

calibrate(valid_loader, lr: float = 0.01, max_iter: int = 50)[source]

Tune the temperature of the model with NLL loss (using the validation set)

Parameters:

valid_loader – validation set loader (DataLoader)

forward(input: Tensor)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

temperature_scale(logits: Tensor)[source]

Temperature scaling on logits

icenet.deep.train

getgenericmodel(conv_type, netparam)[source]

Wrapper to return different torch models

getgenericparam(param, D, num_classes, config={})[source]

Construct generic torch network parameters

getgraphmodel(conv_type, netparam)[source]

Wrapper to return different graph networks

getgraphparam(data_trn, num_classes, param, config={})[source]

Construct graph network parameters

raytune_main(inputs, train_func=None)[source]

Raytune mainloop

torch_construct(X_trn, Y_trn, X_val, Y_val, X_trn_2D, X_val_2D, trn_weights, val_weights, param, args, Y_trn_DA=None, trn_weights_DA=None, Y_val_DA=None, val_weights_DA=None, y_soft=None, data_trn_MI=None, data_val_MI=None, config={'params': {}})[source]

Torch model and data loader constructor

Parameters:

train_* (See other)

Returns:

model, train_loader, test_loader

torch_loop(model, train_loader, test_loader, args, param, config={'params': {}}, ids=None)[source]

Main training loop for all torch based models

train_cutset(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None)[source]

Train cutset model

Parameters:

train_* (See other)

Returns:

Trained model

train_flow(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None)[source]

Train normalizing flow (BNAF) neural model

Parameters:

train_* (See other)

Returns:

trained model

train_flr(config={'params': {}}, data_trn=None, args=None, param=None)[source]

Train factorized likelihood model

Parameters:

train_* (See other)

Returns:

trained model

train_graph_xgb(config={'params': {}}, data_trn=None, data_val=None, trn_weights=None, val_weights=None, args=None, y_soft=None, param=None, feature_names=None)[source]

Train graph model + xgb hybrid model

Parameters:

train_* (See other)

Returns:

trained model

train_torch_generic(X_trn=None, Y_trn=None, X_val=None, Y_val=None, trn_weights=None, val_weights=None, X_trn_2D=None, X_val_2D=None, args=None, param=None, Y_trn_DA=None, trn_weights_DA=None, Y_val_DA=None, val_weights_DA=None, y_soft=None, data_trn_MI=None, data_val_MI=None, ids=None, config={'params': {}})[source]

Train generic neural model [R^d x (2D) -> output]

Parameters:

train_* (See other)

Returns:

trained model

train_torch_graph(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None, y_soft=None)[source]

Train graph neural networks

Parameters:
  • config – raytune parameter dict

  • data_trn – training data

  • data_val – validation data

  • args – arg parameters dict

  • param – model parameters dict

Returns:

trained model

icenet.deep.vae

class Decoder(D, latent_dim=32, hidden_dim=[128, 128], activation='tanh', batch_norm=False, dropout=0.0)[source]
forward(z)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class Encoder(D, hidden_dim=[128, 128], latent_dim=32, activation='tanh', batch_norm=False, dropout=0.0)[source]

Non-variational encoder

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

class VAE(D, latent_dim, encoder_hidden_dim=[128, 128], var_hidden_dim=[128], decoder_hidden_dim=[128, 128], encoder_bn=True, encoder_act='relu', encoder_dropout=0.0, decoder_bn=False, decoder_act='relu', decoder_dropout=0.0, reco_prob='Gaussian', kl_prob='Gaussian', anomaly_score='KL_RECO', C=None, **kwargs)[source]
forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

kl_div(z, mu, std)[source]

KL divergence (always positive), taken against a diagonal multivariate normal here

log q(z|x) - log p(z)

log_pxz(x, xhat)[source]

Reconstruction loss

log p(x|z)

loss_kl_reco(x, xhat, z, mu, std, beta=1.0)[source]

min ( E_q[log q(z|x) - log p(z)] - E_q log p(x|z) )

softpredict(x)[source]
to_device(device)[source]
class VariationalEncoder(D, hidden_dim=[128, 128], var_hidden_dim=[128, 64], latent_dim=32, activation='relu', batch_norm=False, dropout=0.0)[source]

Variational encoder

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

to_device(device)[source]

Needed for cuda

icenet.optim

Custom optimization functions.

icenet.optim.adam

class Adam(params, lr=0.001, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, amsgrad=False, polyak=0.0)[source]
step(closure=None)[source]

Performs a single optimization step.

Parameters:

closure (callable, optional) – A closure that reevaluates the model and returns the loss.

substitute()[source]
swap()[source]

Swapping the running average of params and the current params for saving parameters using polyak averaging

icenet.optim.adamax

class Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, polyak=0)[source]
step(closure=None)[source]

Performs a single optimization step (parameter update).

Parameters:

closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.

Note

Unless otherwise specified, this function should not modify the .grad field of the parameters.

substitute()[source]
swap()[source]

Swapping the running average of params and the current params for saving parameters using polyak averaging

icenet.optim.scheduler

class ReduceLROnPlateau(*args, early_stopping=None, **kwargs)[source]
step(metrics, epoch=None, callback_best=None, callback_reduce=None)[source]

icenet.tools

Tool and auxiliary functions.

icenet.tools.aux_torch

count_parameters_torch(model)[source]

Count the number of trainable pytorch model parameters

load_torch_checkpoint(path='/', label='mynet', epoch=-1)[source]

Load pytorch checkpoint

Parameters:
  • path – folder path

  • label – model label name

  • epoch – epoch index. Use -1 for the last epoch

Returns:

pytorch model

load_torch_model(model, optimizer, filename, device='cpu', param=None, load_start_epoch=False)[source]

PyTorch model loader

save_torch_model(model, optimizer, epoch, losses, filename)[source]

PyTorch model saver

weight2onehot(weights, y, num_classes)[source]

Weights into one-hot encoding

Parameters:
  • weights – array of weights (torch type)

  • y – targets (torch type)

  • num_classes – number of classes

icenet.tools.aux

class Metric(y_true, y_pred, weights=None, class_ids=[0, 1], hist=True, valrange='prob', N_mva_bins=30, verbose=True, num_bootstrap=0, exclude_neg_class=True)[source]

Classifier performance evaluation metrics.

ak2numpy(x: Array, fields: list, null_value: float = -999.0, dtype='float32')[source]

Unzip awkward array to numpy array per column (awkward Record)

Parameters:
  • x – awkward array

  • fields – record field names to extract

  • null_value – missing element value

  • dtype – final numpy array dtype

Returns:

numpy array with columns ordered as ‘fields’ parameter

arrays2matrix(x_arr, y_arr, z_arr, x_binedges, y_binedges, dtype='float32')[source]

Array representation summed to matrix.

Parameters:
  • x_arr – array of [x values]

  • y_arr – array of [y values]

  • z_arr – array of [z values]

  • x_binedges – array of binedges

  • y_binedges – array of binedges

Returns:

Matrix output

auc_score(fpr, tpr)[source]

AUC-ROC via numerical intergration

Parameters:
  • fpr – false positive rate array

  • tpr – true positive rate array

Call sort_fpr_tpr before this function for numerical stability.

Returns:

AUC score

bin2int(b)[source]

Binary vector to integer.

bin_array(num, N)[source]

Convert a positive integer num into an N-bit bit vector.

binaryvec2int(X)[source]

Turn a matrix of binary vectors row-by-row into integer reps.

binom_coeff_all(N, MAX=None)[source]

Sum all all binomial coefficients up to MAX.

binomial(n, k)[source]

Binomial coefficient C(n,k).

binvec2powersetindex(X, B)[source]

Binary vector to powerset index.

Parameters:
  • X – matrix of binary vectors [# number of vectors x dimension]

  • B – the powerset matrix

Returns:

array of powerset indices

Return type:

y

binvec_are_equal(a, b)[source]

Compare equality of two binary vectors a and b.

Parameters:
  • a – binary vectors

  • b – binary vectors

Returns

true or false

cartesian_product(*arrays)[source]

N-dimensional generalized cartesian product between arrays

Parameters:

*arrays – a list of numpy arrays

Example

cartesian_product(*[np.array([1,2,3]), np.array([100,200,500])])

compute_metrics(class_ids, y_true, y_pred, weights)[source]
concatenate_and_clean(array_list: list, axis: int = 0)[source]

Concatenate a list of arrays and clean memory

Parameters:

array_list – a list of Awkward or numpy arrays

Returns:

concatenated array

count_targets(events, ids, entry_start=0, entry_stop=None, new=False, library='np')[source]

Targets statistics printout

Parameters:
  • events – uproot object

  • ids – list of branch identifiers

  • entrystart – uproot starting point

  • entrystop – uproot ending point

Returns:

Printout on stdout

create_model_filename(path: str, label: str, filetype='.dat', epoch: int = -1)[source]

Create model filename based on a set of epoch files in a path.

This function automatically takes the minimum validation loss epoch / iteration

if epoch == - 1, we try to find the best validation loss model

epoch == - 2, we take the latest epoch epoch == N, we take the specific epoch

create_model_filename_xgb(path: str, label: str, filetype='.dat', epoch: int = -1)[source]

Create model filename with xgboost where we have saved only the last epoch boost which contains all the epochs.

This function automatically takes the minimum validation loss epoch / iteration

if epoch == - 1, we try to find the best validation loss model

epoch == - 2, we take the latest epoch epoch == N, we take the specific epoch

deltaphi(phi1, phi2)[source]

Deltaphi measure.

deltar(eta1, eta2, phi1, phi2)[source]

DeltaR measure.

explicit_range(entry_start, entry_stop, num_entries)[source]

Clean None from entry_start and entry_stop

generatebinary(N, M=None, verbose=False)[source]

Function to generate all 2**N binary vectors (as boolean matrix rows) with 1 <= M <= N number of ones (hot bits) (default N)

generatebinary_fixed(n, k)[source]

Generate all combinations of n bits with fixed k ones.

get_datetime()[source]

Return datetime string of style ‘2024-06-04–14-45-07’

getmtime(filename)[source]

Return the last modification time of a file, reported by os.stat()

int2onehot(Y, num_classes)[source]

Integer class vector to class “one-hot encoding”

Parameters:
  • Y – Class indices (# samples)

  • num_classes – Number of classes

Returns:

Onehot representation

Return type:

onehot

inverse_sigmoid(p: ndarray, EPS=1e-09)[source]

Stable inverse sigmoid function

jagged2matrix(arr, scalar_vars, jagged_vars, jagged_dim, entry_start=None, entry_stop=None, null_value: float = -999.0, mode: str = 'columnar', dtype='float32')[source]

Transform a “jagged” event container to a matrix (rows ~ event, columns ~ variables)

Parameters:
  • arr – Awkward array type input for N events

  • scalar_vars – Scalar variables to pick (list of strings)

  • jagged_vars – Jagged variables to pick (list of strings)

  • jagged_dim – Maximum dimension per jagged variable (integer array)

  • null_value – Default value for empty ([]) jagged entries

Returns:

Fixed dimensional 2D-numpy matrix (N x [# scalar var x {#jagged var x maxdim}_i])

jagged2tensor(X, ids, xyz, x_binedges, y_binedges, dtype='float32')[source]
Parameters:
  • X – input data (samples x dimensions) with jagged structure

  • ids – all variable names

  • xyz – array of (x,y,z) channel triplet strings such as [[‘image_clu_eta’, ‘image_clu_phi’, ‘image_clu_e’]]

  • x_binedges

  • y_binedges – arrays of bin edges

Returns:

tensor of size (samples x channels x rows x columns)

Return type:

T

jagged_ak_to_numpy(arr, scalar_vars, jagged_vars, jagged_maxdim, entry_start=None, entry_stop=None, null_value: float = -999.0, dtype='float32')[source]

Transform jagged awkward array to fixed dimensional numpy data

Parameters:
  • arr – jagged awkward array

  • scalar_vars – Scalar variable names

  • jagged_vars – Jagged variable names

  • jagged_maxdim – Maximum dimension per jagged category

  • null_value – Fill null value

Returns:

numpy array, ids

longvec2matrix(X, M, D, order='F')[source]

A matrix representation / dimension converter function useful e.g. for DeepSets and similar neural architectures.

Parameters:
  • X – Numpy input matrix (2-dim) (N x [MD])

  • M – Number of set elements

  • D – Feature dimension

  • order – Reshape direction

Returns:

Output matrix (3-dim) (N x M x D)

los2lol(listOsets)[source]

Convert a list of sets [{},{},..,{}] to a list of of lists [[], [], …, []].

makedir(targetdir, exist_ok=True)[source]

Make directory

merge_connected(lists)[source]

Merge sets with common elements (find connected graphs problem).

Examples

Input: [{0, 1}, {0, 1}, {2, 3}, {2, 3}, {4, 5}, {4, 5}, {6, 7}, {6, 7}, {8, 9}, {8, 9}, {10}, {11}] Output: [{0, 1}, {2, 3}, {4, 5}, {6, 7}, {8, 9}, {10}, {11}]

multiclass_roc_auc_score(y_true, y_pred, weights=None, average='macro')[source]

Multiclass AUC (area under the curve).

Parameters:
  • y_true – True classifications

  • y_pred – Soft probabilities per class

  • weights – Sample weights

  • average – Averaging strategy

Returns:

Area under the curve via averaging

Return type:

auc

number_of_set_bits(i)[source]

Return how many bits are active of an integer in a standard binary representation.

pick_ind(x, minmax)[source]

Return indices between minmax[0] <= x < minmax[1], i.e. [a,b)

Parameters:
  • x – Input vector

  • minmax – Minimum and maximum values

Returns:

indices

pick_index(all_ids: list, vars: list)[source]

Return indices in all_ids corresponding to vars

(vars can contain regexp)

Parameters:
  • all_ids – list of strings, e.g. [‘a’,’b’,’c’]

  • vars – list of string to pick, e.g. [‘a’, ‘c’] or [‘.*’]

Returns:

index array, variable names list

process_regexp_ids(all_ids, ids=None)[source]

Process regular expressions for variable names

Parameters:
  • all_ids – all keys in a tree

  • ids – keys to pick, if None, use all keys

Returns:

ids matching regular expressions

q_exp(x, q: float = 1.0)[source]

q-exponent

q_log(x, q: float = 1.0)[source]

q-logarithm

recursive_concatenate(array_list, max_batch_size: int = 32, axis: int = 0)[source]

Concatenate a list of arrays in a recursive way (to avoid possible problems with one big concatenation e.g. with Awkward)

Parameters:
  • array_list – a list of Awkward or Numpy arrays

  • max_batch_size – maximum number of list elements per concatenation

  • axis – axis to concatenate over

Returns:

concatenated array

red(X, ids, param, mode=None, exclude_tag='exclude_MVA_vars', include_tag='include_MVA_vars', verbose=True)[source]

Reduce the input set variables of X (start with all include, then evaluate exclude, then evaluate include)

Remember that using python sets() is not necessarily stable over runs ! (do not rely on the order of sets)

Parameters:
  • X – data matrix

  • ids – names of columns

  • param – parameter dictionary (from yaml)

  • mode – return mode ‘X’ or ‘ids’

  • exclude_tag – key in param

  • include_tag – key in param

replace_param(default, raytune)[source]

Parameter replacement

set_random_seed(seed)[source]

Set random seeds

sigmoid(x: ndarray)[source]

Stable sigmoid function

slice_range(start, stop, N)[source]

Python slice type processor function

Parameters:
  • start – first index

  • stop – end index + 1

  • N – total number of indices

Returns:

processed indices and total length

Return type:

a,b,b-a

sort_fpr_tpr(fpr, tpr)[source]

For numerical stability with negative weighted events

split(a, n)[source]

Generator which returns approx equally sized chunks. :param a: Total number :param n: Number of chunks

Example

list(split(10, 3))

split_size(a, n)[source]

As split_start_end() but returns only size per chunk

split_start_end(a, n, end_plus=1)[source]

Returns approx equally sized chunks.

Parameters:
  • a – Range, define with range()

  • n – Number of chunks

  • end_plus – Python/nympy index style (i.e. + 1 for the end)

Examples

split_start_end(range(100), 3) returns [[0, 34], [34, 67], [67, 100]] split_start_end(range(5,25), 3) returns [[5, 12], [12, 19], [19, 25]]

to_edges(l)[source]

treat l as a Graph and returns it’s edges

Examples

to_edges([‘a’,’b’,’c’,’d’]) -> [(a,b), (b,c),(c,d)]

to_graph(l)[source]

Turn the list into a graph.

unmask(x, mask, default_value=-1)[source]

Unmasking function

unroll_ak_fields(x, order='first')[source]

Unroll field names in a (nested) awkward array

Parameters:
  • x – awkward array

  • type – return first order and second order field names

Returns:

field names as a list

weighted_avg_and_std(values, weights)[source]

Return the weighted average and standard deviation

x2ind(x, binedges)[source]

Return histogram bin indices for data in x, which needs to be an array []. :param x: data to be classified between bin edges :param binedges: histogram bin edges

Returns:

histogram bin indices

Return type:

inds

yaml_dump(data: dict, filename: str)[source]

Dump dictionary to YAML with custom style

Parameters:
  • data – dictionary

  • filename – full path

icenet.tools.icemap

class icemap(x, ids=None)[source]
Parameters:
  • x – data [N vectors x … x D dimensions]

  • ids – variable names [D strings]

  • OR

  • x – data dictionary

test_icecube_concat()[source]

Unit tests

test_icecube_indexing()[source]

Unit tests

icenet.tools.iceroot

events_to_jagged_numpy(events, ids, entry_start=0, entry_stop=None, maxevents=None, label=None)[source]

Process uproot tree to a jagged numpy (object) array

Parameters:
  • events – uproot tree

  • ids – names of the variables to pick

  • entry_start – first event to consider

  • entry_stop – last event to consider

Returns:

X

get_num_events(rootfile, key_index=0)[source]

Get the number of entries in a rootfile by reading a key

Parameters:
  • rootfile – rootfile string (with possible Tree name appended with :)

  • key_index – which variable use as a dummy

Returns:

number of entries

load_tree(rootfile, tree, entry_start=0, entry_stop=None, maxevents=None, ids=None, library='np', dtype=None, num_cpus=0, verbose=False)[source]

Load ROOT files

Parameters:
  • rootfile – Name of root file paths (string or a list of strings)

  • tree – Tree to read out

  • entry_start – First event to read per file

  • entry_stop – Last event to read per file

  • maxevents – Maximum number of events in total (over all files)

  • ids – Names of the variables to read out from the root tree

  • library – Return type ‘np’ (jagged numpy) or ‘ak’ (awkward) of the array

  • num_cpus – Number of processes used (set 0 for automatic)

  • verbose – Verbose output

Returns:

array of type ‘library’

load_tree_stats(rootfile, tree, key=None, verbose=False)[source]

Load the number of events in a list of rootfiles

Parameters:
  • rootfile – a list of rootfiles

  • tree – tree name to open

  • key – key (variable name) to use to get the number of events, if None then use the first one

  • verbose – verbose output print

Returns:

number of events

read_multiple(process_func, processes, root_path, param, class_id, dtype=None, num_cpus=0, verbose=False)[source]

Loop over different MC / data processes as defined in the yaml files

Parameters:
  • process_func – data processing function

  • processes – MC processes dictionary (from yaml)

  • root_path – main path of files

  • param – parameters of ‘process_func’

  • class_id – class identifier (integer), e.g. 0, 1, 2 …

  • num_cpus – number of CPUs used (set 0 for automatic)

  • verbose – verbose output print

Returns:

X, Y, W, ids, info (awkward array format)

read_single(process_func, process, root_path, param, class_id, dtype=None, num_cpus=0, verbose=False)[source]

Loop over different MC / data processes as defined in the yaml files

[awkward compatible only]

Parameters:
  • process_func – data processing function

  • process – MC / data process dictionary (from yaml)

  • root_path – main path of files

  • param – parameters of ‘process_func’

  • class_id – class identifier (integer), e.g. 0, 1, 2 …

  • num_cpus – number of CPUs used (set 0 for automatic)

  • verbose – verbose output print

Returns:

X, Y, W, ids, info (awkward array format)

icenet.tools.icevec

hepmc2vec4(p)[source]

HepMC3 python binding FourVector to vec4

class vec4(x=None, y=None, z=None, t=None)[source]

Lorentz vectors

abs_delta_phi(v)[source]
property abseta
property beta
boost(b, sign=-1)[source]

Lorentz boost :param b: Boost 4-momentum (e.g. system)

sign : 1 or -1 (direction of the boost, into the rest (-1) or out (1))

property costheta
deltaR(v)[source]
deltaphi(v)[source]
dot3(other)[source]
dot4(other)[source]
property e
property eta
property gamma
property m
property m2
property mt
property p3
property p3mod
property p3mod2
property phi
phi_PIPI(x)[source]
property pt
property pt2
property px
property py
property pz
property rapidity
rotateSO3(R)[source]
rotateX(angle)[source]
rotateY(angle)[source]
rotateZ(angle)[source]
scale(a)[source]
setE(e)[source]
setMagThetaPhi(mag, theta, phi)[source]
setP3(p3)[source]
setPt2RapPhiM2(pt2, rap, phi, m2)[source]
setPtEtaPhi(pt, eta, phi)[source]
setPtEtaPhiM(pt, eta, phi, m)[source]
setPxPyPzE(px, py, pz, e)[source]
setX(x)[source]
setXYZ(x, y, z)[source]
setXYZM(x, y, z, m)[source]
setXYZT(x, y, z, t)[source]
setY(y)[source]
setZ(z)[source]
property t
property theta
property x
property y
property z

icenet.tools.io

class IceXYW(x=array([], dtype=float64), y=array([], dtype=float64), w=None, ids=None)[source]
Parameters:
  • x – data object

  • y – target output data

  • w – weights

classfilter(classid)[source]
find_ind(key)[source]

Return column index corresponding to key

permute(permutation)[source]
apply_madscore(X: array, X_m, X_mad, EPS=1e-12)[source]

MAD-score normalization

apply_zscore(X: array, X_mu, X_std, EPS=1e-12)[source]

Z-score normalization

apply_zscore_tensor(T, mu, std, EPS=1e-12)[source]

Apply z-score normalization for tensors.

calc_madscore(X: array)[source]

Calculate robust normalization.

Parameters:

X – Input with [# vectors x # dimensions]

Returns:

Median vector X_mad : Median deviation vector

Return type:

X_m

calc_zscore(X: array, weights: array | None = None)[source]

Calculate 0-mean & unit-variance normalization.

Parameters:
  • X – Input with [N x dim]

  • weights – Event weights

Returns:

Mean vector X_std : Standard deviation vector

Return type:

X_mu

calc_zscore_tensor(T)[source]

Compute z-score normalization for tensors. :param T: input tensor data (events x channels x rows x cols, …)

Returns:

mu, std tensors

checkinfnan(x, value=0)[source]

Check inf and Nan values and replace with a default value.

count_files_in_dir(path)[source]

Count the number of files in a path

class fastarray1(capacity=32)[source]

1D pre-memory occupied buffer arrays for histogramming etc.

add(x)[source]
reset()[source]
update(row)[source]
values()[source]
get_file_timestamp(file_path: str)[source]

Return file timestamp as a string

get_gpu_memory_map()[source]

Get the GPU VRAM use in GB.

Returns:

dictionary with keys as device ids [integers] and values the memory used by the GPU.

glob_expand_files(datasets, datapath, recursive_glob=False)[source]

Do global / brace expansion of files

Parameters:
  • datasets – dataset filename with glob syntax (can be a list of files)

  • datapath – root path to files

Returns:

full filenames including the path

Return type:

files

impute_data(X, imputer=None, dim=None, values=[-999], labels=None, algorithm='iterative', fill_value=0, knn_k=6)[source]

Data imputation (treatment of missing values, Nan and Inf).

NOTE: This function can impute only fixed dimensional input currently (not Jagged numpy arrays)

Parameters:
  • X – Input data matrix [N vectors x D dimensions]

  • imputer – Pre-trained imputator, default None

  • dim – Array of active dimensions to impute

  • values – List of special integer values indicating the need for imputation

  • labels – List containing textual description of input variables

  • algorithm – ‘constant’, mean’, ‘median’, ‘iterative’, knn_k’

  • knn_k – knn k-nearest neighbour parameter

Returns:

Imputed output data

Return type:

X

index_list(target_list, keys)[source]

Use e.g. x_subset = x[:, io.index_list(ids, variables)]

make_hash_sha256_file(filename)[source]

Create SHA256 hash from a file

make_hash_sha256_object(o)[source]

Create SHA256 hash from an object

Parameters:

o – python object (e.g. dictionary)

Returns:

hash

make_hashable(o)[source]

Turn a python object into hashable type (recursively)

pick_vars(data, set_of_vars)[source]

Choose the active set of input variables.

Parameters:
  • data – IceXYW type object

  • set_of_vars – Variables to pick

Returns:

Chosen indices newvars: Chosen variables

Return type:

newind

process_memory_use()[source]

Return system memory (RAM) used by the process in GB.

rootsafe(txt)[source]

Change character due to ROOT

safetxt(txt)[source]

Protection for ‘/’

showmem(color='red')[source]
showmem_cuda(device='cuda:0', color='red')[source]
split_data(X, Y, W, ids, frac=[0.5, 0.1, 0.4], permute=True)[source]

Split machine learning data into train, validation, test sets

Parameters:
  • X – data matrix

  • Y – target matrix

  • W – weight matrix

  • ids – variable names of columns

  • frac – fraction [train, validate, evaluate] (sum to 1)

  • rngseed – random seed

split_data_simple(X, frac, permute=True)[source]

Split machine learning data into train, validation, test sets

Parameters:
  • X – data as a list of event objects (such as torch geometric Data)

  • frac – split fraction

torch_cuda_total_memory(device='cuda:0')[source]

Return CUDA device VRAM available in GB.

icenet.tools.plots

MVA_plot(metrics, labels, title='', filename='MVA', density=True, legend_fontsize=7)[source]

MVA output plots

ROC_plot(metrics, labels, title='', plot_thresholds=True, thr_points_signal=[0.05, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95], filename='ROC', legend_fontsize=7, xmin=0.0001, alpha=0.32)[source]

Receiver Operating Characteristics i.e. False positive (x) vs True positive (y) :param metrics: :param labels: :param title: :param plot_thresholds: :param thr_points_signal: :param filename: :param legend_fontsize: :param xmin: :param alpha:

annotate_heatmap(X, ax, xlabels, ylabels, x_rot=90, y_rot=0, decimals=1, color='w')[source]

Add text annotations to a matrix heatmap plot

binengine(bindef, x)[source]

Binning processor function

Parameters:
  • bindef – binning definition

  • x – data input array

Examples

50 (number of bins, integer) [1.0, 40.0, 50.0] (list of explicit edges) {‘nbin’: 30, ‘q’: [0.0, 0.95], ‘space’: ‘linear’} (automatic with quantiles) {‘nbin’: 30, ‘minmax’: [2.0, 50.0], ‘space’: ‘log10’} (automatic with boundaries)

Returns:

binning edges

Return type:

edges

binned_1D_AUC(y_pred, y, X_kin, ids_kin, X, ids, edges, weights=None, VAR: str = 'trk_pt', num_bootstrap=0)[source]

Evaluate AUC & ROC per 1D-bin.

Parameters:
  • y_pred – MVA algorithm output

  • y – Output (truth level target) data

  • X_kin – Data

  • ids_X_kin – Variables (strings)

  • X – Data

  • ids – Variables (strings)

  • edges – Edges of the space cells

  • weights – Sample weights

  • VAR – Variable identifier to pick (one)

Returns:

Figure handle and axis met : Metrics object

Return type:

fig,ax

binned_2D_AUC(y_pred, y, X_kin, ids_kin, X, ids, edges, label, weights=None, VAR: list = ['trk_pt', 'trk_eta'])[source]

Evaluate AUC per 2D-bin.

Parameters:
  • y_pred – MVA algorithm output

  • y – Output (truth level target) data

  • X_kin – Data

  • ids_kin – Variables

  • X – Data

  • ids – Variables

  • edges – Edges of the A,B-space cells (2D array)

  • label – Label of the classifier (string)

  • weights – Sample weights

  • VAR – Variable identifiers (two)

Returns:

Figure handle and axis met : Metrics object

Return type:

fig,ax

density_COR(y_pred, X, ids, label, weights=None, hist_edges=[[50], [50]], path='', cmap='Oranges')[source]

Evaluate the 2D-density of the MVA algorithm output vs other variables.

Parameters:
  • y_pred – MVA algorithm output

  • X – Variables to be plotted

  • ids – Identifiers of the variables in X

  • label – Label of the MVA model (string)

  • weights – Sample weights

  • hist_edges – Histogram edges list (or number of bins, as an alternative) (2D)

  • path – Save path

  • cmap – Color map

Returns:

Plot pdf saved directly

density_COR_wclass(y_pred, y, X, ids, label, weights=None, class_ids=None, edges=[[50], [50]], density=True, path='', cmap='Oranges', **kwargs)[source]

Evaluate the 2D-density of the MVA algorithm output vs other variables per class.

Parameters:
  • y_pred – MVA algorithm output

  • y – Output (truth level target) data

  • X – Variables to be plotted

  • ids – Identifiers of the variables in X

  • label – Label of the MVA model (string)

  • weights – Sample weights

  • class__ids – Class ids to plot

  • hist_edges – Histogram edges list (or number of bins, as an alternative) (2D)

  • density – Normalize to density

  • path – Save path

  • cmap – Color map

Returns:

correlation values in a dictionary (per variable, per class) plots are saved directly

density_MVA_wclass(y_pred, y, label, weights=None, class_ids=None, edges=80, path='', **kwargs)[source]

Evaluate MVA output (1D) density per class.

Parameters:
  • y_pred – MVA algorithm output

  • y – Output (truth level target) data

  • label – Label of the MVA model (string)

  • weights – Sample weights

  • class_ids – Class IDs to plot

  • hist_edges – Histogram edges list (or number of bins, as an alternative)

Returns:

Plot pdf saved directly

draw_error_band(ax, x, y, x_err, y_err, **kwargs)[source]

Calculate normals via centered finite differences (except the first point which uses a forward difference and the last point which uses a backward difference).

https://matplotlib.org/stable/gallery/lines_bars_and_markers/curve_error_band.html

multiprocess_AIRW_wrapper(p)[source]

Multiprocessing plots

multiprocess_plot_wrapper(p)[source]

Multiprocessing wrapper

plot_AIRW(X, y, ids, weights, y_pred, pick_ind, label, sublabel, param, tau=1.0, targetdir=None, num_cpus=0)[source]

Plot AI based reweighting results

plot_AUC_matrix(AUC, edges_A, edges_B)[source]

Plot AUC matrix.

Parameters:
  • AUC – AUC-ROC matrix

  • edges_A – Histogram edges of variable A

  • edges_B – Histogram edges of variable B

Returns:

figure handle ax: figure axis

Return type:

fig

plot_contour_grid(pred_func, X, y, ids, targetdir='.', transform='numpy', reso=50, npoints=400)[source]

Classifier decision contour evaluated pairwise for each dimension, other dimensions evaluated at zero=(0,0,…0) (thus z-normalized with 0-mean is suitable)

Parameters:
  • pred_func – prediction function handle

  • X – input matrix

  • y – class targets

  • ids – variable label strings

  • targetdir – output directory

  • transform – ‘numpy’, ‘torch’

  • reso – evaluation resolution

  • npoints – number of points to draw

plot_correlation_comparison(corr_mstats, targetdir, xlim=None)[source]

Plot collected correlation metrics from density_COR_wclass()

Parameters:
  • corr_mstats – statistics dictionary

  • targetdir – output directory

  • xlim – plot limits dictionary per class

Returns:

plots saved to a directory

plot_correlations(X, ids, weights=None, y=None, round_threshold=0.0, targetdir=None, colorbar=False)[source]

Plot a cross-correlation matrix of vector data

Parameters:
  • X – Data matrix (N x D)

  • ids – Variable names (list of length D)

  • weights – Event weights

  • y – Class labels per event (list of length N)

  • round_threshold – Correlation matrix |C| < threshold to set matrix elements to zero

  • targetdir – Output plot directory

  • colorbar – Colorbar on the plot

Returns:

Figures, axes (per class)

Return type:

figs, axs

plot_matrix(XY, x_bins, y_bins, vmin=0, vmax=None, cmap='RdBu', figsize=(4, 3), grid_on=False)[source]

Visualize matrix.

plot_reweight_result(X, y, nbins, binrange, weights, title='', xlabel='x', linewidth=1.5, plot_unweighted=True)[source]

Here plot pure event counts so we see that also integrated class fractions are equalized (or not) after weighting!

plot_selection(X, mask, ids, plotdir, label, varlist, density=True, library='np')[source]

Plot selection before / after type histograms against all chosen variables

Parameters:
  • X – data array (N events x D dimensions)

  • mask – boolean selection indices (N)

  • ids – variable string array (D)

  • plotdir – plotting directory

  • label – a string label

  • varlist – a list of variables to be plotted (from ids)

  • density – normalize all histograms to unit density

  • library – ‘np’ or ‘ak’

plot_train_evolution_multi(losses, trn_aucs, val_aucs, label, aspect=0.85, yscale='linear', xscale='linear')[source]

Training evolution plots.

Parameters:
  • losses – loss values in a dictionary

  • trn_aucs – training metrics

  • val_aucs – validation metrics

Returns:

figure handle ax: figure axis

Return type:

fig

plot_xgb_importance(model, tick_label, importance_type='gain', label=None, sort=True, default_names=False)[source]

Plot XGBoost model feature importance

Parameters:
  • model – xgboost model object

  • dim – feature space dimension

  • tick_label – feature names

  • importance_type – type of importance metric [‘weight’, ‘gain’, ‘cover’, ‘total_gain’, ‘total_cover’]

  • default_names – True for xgboost default, else set False (uses tick_label)

Returns

fig, ax

plotvar(x, y, var, weights, nbins=70, percentile_range=[0.5, 99.5], plot_unweighted=True, title='', targetdir='.')[source]

Plot a single variable.

plotvars(X, y, ids, weights, nbins=70, percentile_range=[0.5, 99.5], exclude_vals=[None], plot_unweighted=True, title='', targetdir='.', num_cpus: int = 0)[source]

Plot all variables.

table_writer(filename, label, sublabel, tau, chi2_table, print_to_screen=False)[source]

Helper function to write a chi2 table to a file

icenet.tools.prints

colored_row(x, active_color='green', inactive_color='white', threshold=0.5, **kwargs)[source]

Color vector elements.

print_RAM_usage()[source]
print_colored_matrix(X, **kwargs)[source]

Print matrix with two colors (suitable for binary matrices).

print_flow(flow)[source]

Print a cut or infoflows.

print_variables(X: array, ids: List[str], W=None, exclude_vals=None, output_file=None)[source]

Print statistics of X

Parameters:
  • X – array (n x dim)

  • ids – variable names (dim)

  • W – event weights

  • exclude_vals – exclude special values from the stats

Returns:

prettyprint table of stats

print_weights(weights, y, output_file=None, header=None, write_mode='w')[source]

Print event weights table

printbar(marker='-', marks: int = 75)[source]

Print bar.

printbranch(d)[source]

Print a branch.

set_arr_format(precision)[source]

Set numpy array print format.

icenet.tools.process

combine_pickle_data(args)[source]

Load splitted pickle data and return full dataset arrays

Parameters:

args – main argument dictionary

concatenate_data(data, max_batch_size: int = 32)[source]

Helper function to concatenate arrays with a specified maximum batch size

evaluate_models(data=None, info=None, args=None)[source]

Evaluate ML/AI models.

Parameters:

objects (Different datatype)

Returns:

Saves evaluation plots to the disk

generic_flow(rootname, func_loader, func_factor)[source]

Generic (read data – train models – evaluate models) workflow

Parameters:
  • rootname – name of the workflow config folder

  • func_loader – data loader (function handle)

  • func_factor – data transformer (function handle)

impute_datasets(data, args, features=None, imputer=None)[source]

Dataset imputation

Parameters:
  • data – .x, .y, .w, .ids type object

  • args – imputer parameters

  • features – variables to impute (list), if None, then all are considered

  • imputer – imputer object (scikit-type)

Returns:

imputed data

load_file_wrapper(index, filepath)[source]

Helper function

make_plots(data, args, runmode)[source]

Basic Q/A-plots

plot_XYZ_multiple_models(targetdir, args)[source]
plot_XYZ_wrap(func_predict, x_input, y, weights, label, targetdir, args, X_kin, ids_kin, X_RAW, ids_RAW)[source]

Arbitrary plot steering function. Add new plot types here, steered from plots.yml

process_data(args, data, func_factor, mvavars, runmode)[source]

Process data to high level representations and split to train/eval/test

process_raw_data(args, func_loader)[source]

Load raw input from the disk – this is executed only by ‘genesis’

read_cli()[source]
read_config(config_path='configs/xyz/', runmode='all')[source]

Commandline and YAML configuration reader

train_eval_data_processor(args, func_factor, mvavars, runmode)[source]

Read/write (MVA) data and return full processed dataset

train_models(data_trn, data_val, args=None)[source]

Train ML/AI models wrapper with pre-processing.

Parameters:

objects (Different datatype)

Returns:

Saves trained models to disk

icenet.tools.raytools

class ProgressBar(total: int, description: str = '')[source]
property actor: ActorHandle

Returns a reference to the remote ProgressBarActor.

When you complete tasks, call update on the actor.

description: str
pbar: tqdm
print_until_done() None[source]

Blocking call.

Do this after starting a series of remote Ray tasks, to which you’ve passed the actor handle. Each of them calls update on the actor. When the progress meter reaches 100%, this method returns.

progress_actor: ActorHandle
total: int

icenet.tools.reweight

AIRW_helper(x, y, w, ids, pdf, args, x_val, y_val, w_val, EPS=1e-12)[source]

Helper function for ML based reweighting

balanceweights(weights_doublet, reference_class, y, EPS=1e-12)[source]

Balance N-class weights to sum to equal counts.

Parameters:
  • weights_doublet – N-class event weights (events x classes)

  • reference_class – which class gives the reference (integer)

  • y – class targets

Returns:

weights doublet with new weights per event

compute_ND_reweights(x, y, w, ids, args, pdf=None, EPS=1e-12, x_val=None, y_val=None, w_val=None, skip_reweights=False)[source]

Compute N-dim reweighting coefficients

Supports ‘ML’ (ND), ‘pseudo-ND’ (1D x 1D … x 1D), ‘2D’, ‘1D’

For ‘args’ dictionary structure, see steering cards.

Parameters:
  • x – training sample input

  • y – training sample (class) labels

  • w – training sample weights

  • ids – variable names of columns of x

  • pdf – pre-computed pdfs (default None)

  • args – reweighting parameters in a dictionary

Returns:

1D-array of re-weights pdf : computed pdfs

Return type:

weights

doublet_helper(x, y, w, class_ids)[source]
histogram_helper(x, y, w, ids, pdf, args, EPS)[source]

Helper function for histogram based reweighting

map_xyw(x, y, w, vars, c, reference_class)[source]

For AIRW helper

pdf_1D_hist(X, w, binedges)[source]

Compute re-weighting 1D pdfs.

pdf_2D_hist(X_A, X_B, w, binedges_A, binedges_B)[source]

Compute re-weighting 2D pdfs.

reweightcoeff1D(X, y, pdf, reference_class, max_reg=1000.0, EPS=1e-12)[source]

Compute N-class density reweighting coefficients.

Parameters:
  • X – Observable of interest (N x 1)

  • y – Class labels (N x 1)

  • pdf – PDF for each class

  • reference_class – e.g. 0 (background) or 1 (signal)

Returns:

weights for each event

reweightcoeff2D(X_A, X_B, y, pdf, reference_class, max_reg=1000.0, EPS=1e-12)[source]

Compute N-class density reweighting coefficients.

Operates in full 2D without any factorization.

Parameters:
  • X_A – First observable of interest (N x 1)

  • X_B – Second observable of interest (N x 1)

  • y – Class labels (N x 1)

  • pdf – Density histograms for each class

  • reference_class – e.g. Background (0) or signal (1)

  • max_reg – Regularize the maximum reweight coefficient

Returns:

weights for each event

rw_transform(phat, mode, EPS=1e-12)[source]

AI/Deep reweighting transform

Parameters:
  • phat – estimated probabilities

  • mode – operation mode

Returns:

transformed values

rw_transform_with_logits(logits, mode, absMax=30)[source]

AI/Deep reweighting transform with logits input

Parameters:
  • prob – probabilities

  • mode – operation mode

Returns:

transformed values

icenet.tools.stx

apply_cutflow(cut, names, xcorr_flow=True, EPS=1e-12)[source]

Apply cutflow

Parameters:
  • cut – list of pre-calculated cuts, each list element is a boolean array

  • names – list of names (description of each cut, for printout only)

  • xcorr_flow – compute full N-point correlations

  • return_powerset – return each of 2**|cuts| as a separate boolean mask vector

Returns:

boolean mask of size number of events (1 = pass, 0 = fail)

Return type:

mask

construct_columnar_cuts(X, ids, cutlist)[source]

Construct cuts and corresponding names.

Parameters:
  • X – Input columnar data matrix

  • ids – Variable names for each column of X

  • cutlist – Selection cuts as strings, such as [‘ABS@eta < 0.5’, ‘trgbit == True’]

Returns:

masks (boolean arrays) in a list, boolean expressions (list)

construct_exptree(root)[source]

Construct an expression (syntax) tree via recursion.

Parameters:

root – List of lists, for example [[‘10’, ‘>’, ‘7’], ‘AND’, [[‘4’, ‘>=’, ‘2’], ‘AND’, [‘2’, ‘<=’, ‘4’]]]

Returns:

an expression tree object with ‘tree_node’ objects

eval_boolean_exptree(root, X, ids)[source]

Evaluation of a (boolean) expression tree via recursion.

Parameters:
  • root – expression tree object

  • X – data matrix (N events x D dimensions)

  • ids – variable names for each D dimension

Returns:

boolean selection list of size N

eval_boolean_syntax(expr, X, ids, verbose=False)[source]

A complete wrapper to evaluate boolean syntax.

Parameters:
  • expr – boolean expression string, e.g. “pt > 7.0 AND (x < 2 OR x >= 4)”

  • X – input data (N x dimensions)

  • ids – variable names as a list

Returns:

boolean list of size N

filter_constructor(filters, X, ids, y=None)[source]

Filter product main constructor

Parameters:
  • filters – yaml file input

  • X – columnar input data

  • ids – data column keys (list of strings)

  • y – class labels (default None), used for diplomat (always passing) classes

Returns:

mask matrix, mask text labels (list), mask path labels (list)

parse_boolean_exptree(instring)[source]

A boolean expression tree parser.

Parameters:

instring – input string, e.g. “pt > 7.0 AND (x < 2 OR x >= 4)”

Returns:

A syntax tree as a list of lists

Information:
See: https://stackoverflow.com/questions/11133339/

parsing-a-complex-logical-expression-in-pyparsing-in-a-binary-tree-fashion

powerset_constructor(cutset, X, ids)[source]

Powerset (all subsets of boolean combinations) filter constructor

Returns:

mask matrix, mask text labels (list), mask path labels (list)

powerset_cutmask(cut)[source]

Generate powerset 2**|cuts| masks

Parameters:

cut – list of pre-calculated cuts, each list element is a boolean array

Returns:

(2**|cuts| x num_events) sized boolean mask matrix

Return type:

mask

print_exptree(root)[source]

Print out an expression tree object via recursion.

Parameters:

root – expression tree (object type)

print_parallel_cutflow(masks, names, EPS=1e-12)[source]

Print boolean combination cutflow statistics

Parameters:
  • cut – list of pre-calculated cuts, each list element is a boolean array with size of num of events

  • names – list of names (description of each cut, for printout only)

print_stats(mask, text)[source]

Print filter mask category statistics

Parameters:
  • mask – computed event filter mask

  • text – filter descriptions

set_constructor(cutset, X, ids, veto=False)[source]

Direct set filter constructor

Returns:

mask matrix, mask text labels (list), mask path labels (list)

test_powerset()[source]
test_syntax_tree_flip()[source]

Unit tests

test_syntax_tree_parsing()[source]

Unit tests

test_syntax_tree_simple()[source]

Units tests

class tree_node(value)[source]

Class to represent the nodes of an expression tree.