icenet¶
Core classes including deep learning, data structures, manipulations and visualizations.
icenet.algo¶
Various (classic) algorithms.
icenet.algo.analytic¶
- count_simple_edges(num_nodes, directed, self_loops)[source]¶
Count number of edges in a (semi)-fully connected adjacency matrix
- deltaR(x, eta1: str, eta2: str, phi1: str, phi2: str)[source]¶
dR distance (invariant [massless limit y –> eta] under longitudinal boosts)
With awkward arrays
- fox_wolfram_boost_inv(p, L=10)[source]¶
arxiv.org/pdf/1508.03144, (Formula 5.6)
- Parameters:
p – list of 4-momentum vectors
L – maximum angular moment order
- Returns:
list of moments of order 0,1,…,L
- Return type:
S
[untested function]
- get_Lorentz_edge_features(p4vec, num_nodes, num_edges, num_edge_features, directed, self_loops, EPS=1e-12)[source]¶
- gram_matrix(X, type='dot')[source]¶
Gram matrix for 4-vectors.
- Parameters:
X – Array (list of N) of 4-vectors
type – Type of Lorentz scalars computed (‘dot’, ‘s’, ‘t’)
- Returns:
Gram matrix (NxN)
- Return type:
G
- invmass(x, pt1: str, pt2: str, eta1: str, eta2: str, phi1: str, phi2: str, m1_const=0.1396, m2_const=0.1396)[source]¶
invariant mass (exact)
With awkward arrays
- invmass_massless(x, pt1: str, pt2: str, eta1: str, eta2: str, phi1: str, phi2: str)[source]¶
invariant mass (massless limit)
With awkward arrays
- ktmetric(kt2_i, kt2_j, dR2_ij, p=-1, R=1.0)[source]¶
kt-algorithm type distance measure.
- Parameters:
kt2_i – Particle 1 pt squared
kt2_j – Particle 2 pt squared
delta2_ij – Angular seperation between particles squared (deta**2 + dphi**2)
R – Radius parameter
1 (p =) – (p=1) kt-like, (p=0) Cambridge/Aachen, (p=-1) anti-kt like
- Returns:
distance measure
icenet.algo.flr¶
- predict(X, b_pdfs, s_pdfs, bin_edges, return_prob=True, EPS=1e-12)[source]¶
Evaluate the likelihood ratio.
- Parameters:
X – input data [# vectors x # dimensions]
b_pdfs – background pdfs
s_pdfs – signal pdfs
bin_edges – bin edges
return_prob – return probability if True, or likelihood ratio
- Returns:
likelihood ratio, or probability
- Return type:
LR
- train(X, y, weights, param)[source]¶
Factorized likelihood classifier training.
- Parameters:
X – input data [# vectors x # dimensions]
y – target data
weights – weighted events
param – dictionary for the parameters
- Returns:
background pdfs s_pdfs: signal pdfs bin_edges: histogram bin edges
- Return type:
b_pdfs
icenet.algo.nmf¶
- ML_nmf(V, k, threshold=1e-08, maxiter=500)[source]¶
Non-negative matrix factorization main function.
- Parameters:
V – (d x n) array (dimension x samples)
k – number of components
threshold – relative error threshold (Frob norm)
maxiter – maximum number of iterations
- Returns:
(d x k) array of basis elements H: (k x n) array of weights for each observations
- Return type:
W
- ML_update_H(V, W, H)[source]¶
Multiplicative (EM-type) non-negative matrix factorization update for the expansion weights.
- Parameters:
V – (d x n) (dimension x samples)
W – (d x k) (dimension x dictionary size)
H – (k x n) (expansion weights for each sample)
- Returns:
(k x n) array of updated weights for each sample
- Return type:
H
- ML_update_W(V, W, H)[source]¶
Multiplicative (EM-type) non-negative matrix factorization update for basis components.
- Parameters:
V – (d x n) (dimension x samples)
W – (d x k) (dimension x dictionary size)
H – (k x n) (expansion weights for each sample)
- Returns:
(d x k) updated non-negative basis compoments
- Return type:
W
icenet.deep¶
Deep learning model classes.
icenet.deep.autogradxgb¶
- class XgboostObjective(loss_func: Callable[[Tensor, Tensor], Tensor], mode: str = 'train', flatten_grad: bool = False, hessian_mode: str = 'constant', hessian_const: float = 1.0, hessian_gamma: float = 0.9, hessian_slices: int = 10, device: device = 'cpu')[source]¶
- Parameters:
loss_func – Loss function handle
mode – ‘train’ or ‘eval’
flatten_grad – For vector valued model output [experimental]
hessian_mode – ‘constant’, ‘squared_approx’, ‘iterative’, ‘hutchinson’, ‘exact’
hessian_const – Scalar parameter ‘constant ‘hessian_mode’
hessian_gamma – Hessian momentum smoothing parameter for ‘iterative’ mode
hessian_slices – Hutchinson Hessian diagonal estimator MC slice sample size
device – Torch device
- derivatives(loss: Tensor, preds: Tensor) Tuple[Tensor, Tensor] [source]¶
Gradient and Hessian diagonal
- Parameters:
loss – loss function values
preds – model predictions
- Returns:
gradient vector, hessian diagonal vector
icenet.deep.bnaf¶
- class BNAF(*args, res: str | None = None)[source]¶
Class that extends
torch.nn.Sequential
for constructing a Block Neural Normalizing Flow.
- class MaskedWeight(in_features: int, out_features: int, dim: int, bias: bool = True)[source]¶
Module that implements a linear layer with block matrices with positive diagonal blocks. Moreover, it uses Weight Normalization (https://arxiv.org/abs/1602.07868) for stability.
- forward(inputs, grad: Tensor | None = None)[source]¶
- Parameters:
inputs –
torch.Tensor
, required. The input tensor.grad –
torch.Tensor
, optional.transformations. (The log diagonal block of the partial Jacobian of previous)
- Returns:
The output tensor and the log diagonal blocks of the partial log-Jacobian of previous transformations combined with this transformation.
- class Permutation(in_features: int, p: list | None = None)[source]¶
Module that outputs a permutation of its input.
- class Sequential(*args: Module)[source]¶
- class Sequential(arg: OrderedDict[str, Module])
Class that extends
torch.nn.Sequential
for computing the output of the function together with the log-det-Jacobian of such transformation.
- class Tanh(*args, **kwargs)[source]¶
Class that extends
torch.nn.Tanh
additionally computing the log diagonal blocks of the Jacobian.- forward(inputs, grad: Tensor | None = None)[source]¶
- Parameters:
inputs –
torch.Tensor
, required. The input tensor.grad –
torch.Tensor
, optional.transformations. (The log diagonal blocks of the partial Jacobian of previous)
- Returns:
The output tensor and the log diagonal blocks of the partial log-Jacobian of previous transformations combined with this transformation.
icenet.deep.cnn¶
- class CNN(C, out_dim=None, nchannels=1, nrows=32, ncols=32, dropout_cnn=0.0, dropout_mlp=0.5, mlp_dim=128)[source]¶
-
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class CNN_MAXO(D, C, out_dim=None, nchannels=1, nrows=32, ncols=32, dropout_cnn=0.0, mlp_dim=50, num_units=6, dropout_mlp=0.1)[source]¶
Dual (simultaneous) input network [image tensors x global vectors]
icenet.deep.da¶
- class GradientReversal(alpha=1.0)[source]¶
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class GradientReversalFunction(*args, **kwargs)[source]¶
Unsupervised Domain Adaptation by Backpropagation https://arxiv.org/abs/1409.7495
Notes: The forward pass is an identity map. In the backprogation, the gradients are reversed by grad -> -alpha * grad.
Example
net = nn.Sequential(nn.Linear(10, 10), GradientReversal(alpha=1.0))
- static backward(ctx, grads)[source]¶
Define a formula for differentiating the operation with backward mode automatic differentiation.
This function is to be overridden by all subclasses. (Defining this function is equivalent to defining the
vjp
function.)It must accept a context
ctx
as the first argument, followed by as many outputs as theforward()
returned (None will be passed in for non tensor outputs of the forward function), and it should return as many tensors, as there were inputs toforward()
. Each argument is the gradient w.r.t the given output, and each returned value should be the gradient w.r.t. the corresponding input. If an input is not a Tensor or is a Tensor not requiring grads, you can just pass None as a gradient for that input.The context can be used to retrieve tensors saved during the forward pass. It also has an attribute
ctx.needs_input_grad
as a tuple of booleans representing whether each input needs gradient. E.g.,backward()
will havectx.needs_input_grad[0] = True
if the first input toforward()
needs gradient computed w.r.t. the output.
- static forward(ctx, x, alpha)[source]¶
Define the forward of the custom autograd Function.
This function is to be overridden by all subclasses. There are two ways to define forward:
Usage 1 (Combined forward and ctx):
@staticmethod def forward(ctx: Any, *args: Any, **kwargs: Any) -> Any: pass
It must accept a context ctx as the first argument, followed by any number of arguments (tensors or other types).
See combining-forward-context for more details
Usage 2 (Separate forward and ctx):
@staticmethod def forward(*args: Any, **kwargs: Any) -> Any: pass @staticmethod def setup_context(ctx: Any, inputs: Tuple[Any, ...], output: Any) -> None: pass
The forward no longer accepts a ctx argument.
Instead, you must also override the
torch.autograd.Function.setup_context()
staticmethod to handle setting up thectx
object.output
is the output of the forward,inputs
are a Tuple of inputs to the forward.See extending-autograd for more details
The context can be used to store arbitrary data that can be then retrieved during the backward pass. Tensors should not be stored directly on ctx (though this is not currently enforced for backward compatibility). Instead, tensors should be saved either with
ctx.save_for_backward()
if they are intended to be used inbackward
(equivalently,vjp
) orctx.save_for_forward()
if they are intended to be used for injvp
.
icenet.deep.dbnf¶
- compute_log_p_x(model, x)[source]¶
Model log-density value log[pdf(x), where x is the data vector]
log p(x) = log p(z) + sum_{k=1}^K log|det J_{f_k}|
- Parameters:
model – model object
x – N minibatch vectors
- Returns:
log-likelihood value
- create_model(param, verbose=False, rngseed=0)[source]¶
Construct the network object.
- Parameters:
param – parameters
- Returns:
model object
- Return type:
model
- get_pdf(model, x)[source]¶
Evaluate learned density (pdf) at point x
- Parameters:
model – model object
x – input vector(s)
- Returns:
pdf value
Examples
> x = torch.tensor([[1.0, 2.0]]) > l = get_pdf(model,x)
- predict(X, models, return_prob=True, EPS=1e-09)[source]¶
2-class density ratio pdf(x,S) / pdf(x,B) for each vector x.
- Parameters:
param – input parameters
X – pytorch tensor of vectors
models – list of model objects
return_prob – return pdf(S) / (pdf(S) + pdf(B)), else pdf(S) / pdf(B)
- Returns:
likelihood ratio (or alternatively probability)
- train(model, optimizer, scheduler, trn_x, val_x, trn_weights, val_weights, param, modeldir, save_name)[source]¶
Train the model density.
- Parameters:
model – initialized model object
optimizer – optimizer object
scheduler – optimization scheduler
trn_x – training vectors
val_x – validation vectors
trn_weights – training weights
val_weights – validation weights
param – parameters
modeldir – directory to save the model
icenet.deep.deeptools¶
- class Multiply(alpha)[source]¶
Multiplication with a non-learnable constant alpha
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- adaptive_gradient_clipping_(main_module: Module, MI_module: Module, EPS=1e-09)[source]¶
Adaptively clip the gradient from the mutual information module, so that its Frobenius norm is at most that of the gradient from the main network.
See: https://arxiv.org/abs/1801.04062 (Appendix)
- Parameters:
generator_module – Generator/classifier/regressor… main task network (nn.Module)
mi_module – MI regulator network (nn.Module)
- grad_norm(module: Module)[source]¶
Compute the total (Frobenius) norm for the gradients of a torch network
- Parameters:
module – torch network
- Returns:
total gradient norm
- set_scheduler(optimizer: dict, param: dict)[source]¶
- Parameters:
optimizer – optimizers for different models
param – setup parameters
- Returns:
torch scheduler
- weights_init_all(model, init_funcs)[source]¶
Examples
model = MyNet() weights_init_all(model, init_funcs)
icenet.deep.deps¶
- class DEPS(D, C, z_dim, out_dim=None, phi_layers=3, rho_layers=3, pool='max', dropout=0.1, **kwargs)[source]¶
Permutation equivariant networks.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class PEN1_max(in_dim, out_dim)[source]¶
Permutation Equivariant Network (PEN) max-type layers.
Single dimensional model.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class PEN1_mean(in_dim, out_dim)[source]¶
Permutation Equivariant Network (PEN) mean-type layers.
Single dimensional model.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class PEN_max(in_dim, out_dim)[source]¶
Permutation Equivariant Network (PEN) max-type layers.
Multidimensional model.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class PEN_mean(in_dim, out_dim)[source]¶
Permutation Equivariant Network (PEN) mean-type layers.
Multidimensional model.
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.dmlp¶
- class DMLP(D, C, out_dim=None, mlp_dim=[128, 64], activation='relu', layer_norm=False, batch_norm=False, dropout=0.0, skip_connections=False, act_after_norm=True, last_tanh=False, last_tanh_scale=10.0, **kwargs)[source]¶
-
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class LinearLayer(dim_in, dim_out, skip_connections=False, activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, act_after_norm=True)[source]¶
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- MLP(layers: List[int], activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, last_act: bool = False, skip_connections=False, act_after_norm=True)[source]¶
Return a Multi Layer Perceptron with an arbitrary number of layers.
- Parameters:
layers – input structure, such as [128, 64, 64] for a 3-layer network.
activation – activation function
layer_norm – layer normalization
batch_norm – batch normalization
dropout – dropout regularization
skip_connections – skip connections active
last_act – apply activation function after the last layer
act_after_norm – activation function application order
- Returns:
nn.sequential object
- MLP_ALL_ACT(layers: List[int], activation: str = 'relu', layer_norm: bool = False, batch_norm: bool = False, dropout: float = 0.0, skip_connections=False, act_after_norm: bool = True)[source]¶
Return a Multi Layer Perceptron with an arbitrary number of layers.
All layers with the activation + other operations applied.
icenet.deep.fastkan¶
- class AttentionWithFastKANTransform(q_dim: int, k_dim: int, v_dim: int, head_dim: int, num_heads: int, gating: bool = True)[source]¶
- forward(q: Tensor, k: Tensor, v: Tensor, bias: Tensor | None = None) Tensor [source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class FastKAN(D, C, mlp_dim: ~typing.List[int], grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, use_base_update: bool = False, base_activation=<function silu>, spline_weight_init_scale: float = 0.1, out_dim=None, last_tanh=False, last_tanh_scale=10.0, **kwargs)[source]¶
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class FastKANLayer(input_dim: int, output_dim: int, grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, use_base_update: bool = False, base_activation=<function silu>, spline_weight_init_scale: float = 0.1)[source]¶
- forward(x, time_benchmark=False)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class RadialBasisFunction(grid_min: float = -2.0, grid_max: float = 2.0, num_grids: int = 8, denominator: float | None = None)[source]¶
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.gcnn¶
- class GCN(D, Z, C, out_dim=None, dropout=0.5)[source]¶
Graph Convolution Network
- forward(x, adj_matrix)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.graph¶
- class GNNGeneric(d_dim, out_dim, u_dim=0, e_dim=None, z_dim=96, C=None, conv_type='EdgeConv', task='node', global_pool='mean', conv_MLP_act='relu', conv_MLP_bn=True, conv_MLP_dropout=0.0, conv_aggr='max', conv_knn=8, fusion_MLP_act='relu', fusion_MLP_bn=False, fusion_MLP_dropout=0.0, final_MLP_act='relu', final_MLP_bn=False, final_MLP_dropout=0.0, DA_active=False, DA_alpha=1.0, DA_MLP=[128, 64], DA_MLP_act='relu', DA_MLP_bn=False, DA_MLP_dropout=0.0, **kwargs)[source]¶
Technical Remarks:
Remember always to use MLP_ALL_ACT in the intermediate blocks, i.e. MLPs with an activation function also after the last layer. (otherwise very bad performance may happen for certain message passing / convolution operators).
- GINE_helper()[source]¶
GINEConv requires node features and edge features with the same dimension. Increase dimensionality here.
- forward(data, conv_only=False)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class SuperEdgeConv(mlp_edge: Callable, mlp_latent: Callable, aggr: str = 'mean', mp_attn_dim: int = 0, use_residual=True, **kwargs)[source]¶
Custom GNN convolution operator aka ‘generalized EdgeConv’ (original EdgeConv: arxiv.org/abs/1801.07829)
- forward(x: Tensor | Tuple[Tensor, Tensor], edge_index: Tensor | SparseTensor, edge_attr: Tensor | None = None, edge_weight: Tensor | None = None, size: Tuple[int, int] | None = None) Tensor [source]¶
Runs the forward pass of the module.
- message(x_i: Tensor, x_j: Tensor, edge_attr: Tensor | None, edge_weight: Tensor | None) Tensor [source]¶
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
icenet.deep.iceboost¶
- BCE_loss_with_logits(input: Tensor, target: Tensor, weights: Tensor | None = None, epsilon=None)[source]¶
Numerically stable BCE loss with logits https://medium.com/@sahilcarterr/why-nn-bcewithlogitsloss-numerically-stable-6a04f3052967
icenet.deep.losstools¶
- class FocalWithLogitsLoss(weight=None, gamma=2, reduction='mean')[source]¶
Focal Loss with logits as input
https://arxiv.org/abs/1708.02002
- forward(predicted, target)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- LOGIT_L1_loss(logits, logit_beta=1.0, weights=None)[source]¶
Logit magnitude L1-regularization sum |z|
- LOGIT_L2_loss(logits, logit_beta=1.0, weights=None)[source]¶
Logit magnitude L2-regularization sum |z|^2
- class LqBernoulliWithLogitsLoss(weight=None, q=1.0, reduction='mean')[source]¶
L_q likelihood for the Bernoulli case
https://arxiv.org/pdf/1002.4533
- forward(predicted, target)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- SWD_reweight_loss(logits, x, y, weights=None, p=1, num_slices=1000, norm_weights=True, mode='SWD')[source]¶
# Sliced Wasserstein reweight U (y==0) -> V (y==1) transport
- binary_cross_entropy_logprob(log_phat_0, log_phat_1, y, weights=None)[source]¶
Per instance weighted binary cross entropy loss (y can be a scalar between [0,1]) (negative log-likelihood)
- log_softmax(x, dim=-1)[source]¶
Log of Softmax
- Parameters:
x – network output without softmax
- Returns:
logsoftmax values
- loss_wrapper(model, x, y, num_classes, weights, param, y_DA=None, w_DA=None, MI=None, EPS=1e-12)[source]¶
A wrapper function to loss functions
Note
log-likelihood functions can be weighted linearly, due to prod_i p_i(x_i; theta)**w_i ==log==> sum_i w_i log p_i(x_i; theta)
- multiclass_cross_entropy_logprob(log_phat, y, num_classes, weights=None)[source]¶
Per instance weighted cross entropy loss (negative log-likelihood)
- multiclass_focal_entropy_logprob(log_phat, y, num_classes, gamma, weights=None)[source]¶
Per instance weighted ‘focal entropy loss’ https://arxiv.org/pdf/1708.02002.pdf
icenet.deep.lzmlp¶
- class LZMLP(D, C, out_dim=None, mlp_dim=[128, 64], activation='relu', layer_norm=False, batch_norm=False, dropout=0.0, last_tanh=False, last_tanh_scale=10.0, act_after_norm=True, **kwargs)[source]¶
-
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class LipschitzLinear(in_features, out_features)[source]¶
Lipschitz linear layer
- forward(input)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.maxo¶
- class MAXOUT(D, C, num_units, neurons, dropout, out_dim=None, **kwargs)[source]¶
MAXOUT network
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.mlgr¶
- class MLGR(D, C, out_dim=None)[source]¶
Multinomial Logistic Regression model
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.optimize¶
- test(model, loader, device, opt_param: dict, MI: dict | None = None, compute_loss: bool = False)[source]¶
Pytorch based testing routine.
- Parameters:
model – pytorch model
loader – pytorch dataloader
device – ‘cpu’ or ‘device’
opt_param – optimization parameters
MI – MI parameters
compute_loss – compute the loss
- Returns
loss dictionary, accuracy, AUC
- train(model, loader, optimizer, device, opt_param: dict, MI: dict | None = None)[source]¶
Pytorch based training routine.
- Parameters:
model – pytorch model
loader – pytorch dataloader
optimizer – pytorch optimizer
device – ‘cpu’ or ‘device’
opt_param – optimization parameters
MI – MI parameters
- Returns
trained model (return implicit via input arguments)
icenet.deep.pgraph¶
- class PANConv(in_channels, out_channels, filter_size=4, panconv_filter_weight=None)[source]¶
- forward(x, edge_index, num_nodes=None, edge_mask_list=None)[source]¶
Runs the forward pass of the module.
- message(x_j, norm)[source]¶
Constructs messages from node \(j\) to node \(i\) in analogy to \(\phi_{\mathbf{\Theta}}\) for each edge in
edge_index
. This function can take any argument as input which was initially passed topropagate()
. Furthermore, tensors passed topropagate()
can be mapped to the respective nodes \(i\) and \(j\) by appending_i
or_j
to the variable name, .e.g.x_i
andx_j
.
- class PANDropout(filter_size=4)[source]¶
- forward(edge_index, p=0.5)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class PANUMPooling(in_channels, ratio=0.5, min_score=None, multiplier=1, nonlinearity=<built-in method tanh of type object>)[source]¶
Specific Graph pooling layer based on unnormalized M from PAN, which can only work after PANConv.
- class PANXHMPooling(in_channels, ratio=0.5, pan_pool_weight=None, min_score=None, multiplier=1, nonlinearity=<built-in method tanh of type object>, filter_size=3, panpool_filter_weight=None)[source]¶
General Graph pooling layer based on PAN, which can work with all layers.
icenet.deep.predict¶
icenet.deep.tempscale¶
- class LogitsWithTemperature(mode='softmax', device='cpu')[source]¶
“Temperate calibration” wrapper class.
Use with original raw logits and class labels as an input.
- calibrate(logits: Tensor, labels: Tensor, weights: Tensor | None = None, lr: float = 0.01, max_iter: int = 50)[source]¶
Tune the temperature of the model with NLL loss (using the validation set)
- Parameters:
logits – model output logits per event (single or softmax type)
labels – class label per event (torch.float32)
weights – weights per event
- class ModelWithTemperature(model, mode='softmax', device='cpu')[source]¶
“Temperate calibration” wrapper class.
Output of the original network needs to be in logits, not softmax or log softmax.
Expects model(input) to return logits.
- calibrate(valid_loader, lr: float = 0.01, max_iter: int = 50)[source]¶
Tune the temperature of the model with NLL loss (using the validation set)
- Parameters:
valid_loader – validation set loader (DataLoader)
- forward(input: Tensor)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.deep.train¶
- getgenericparam(param, D, num_classes, config={})[source]¶
Construct generic torch network parameters
- torch_construct(X_trn, Y_trn, X_val, Y_val, X_trn_2D, X_val_2D, trn_weights, val_weights, param, args, Y_trn_DA=None, trn_weights_DA=None, Y_val_DA=None, val_weights_DA=None, y_soft=None, data_trn_MI=None, data_val_MI=None, config={'params': {}})[source]¶
Torch model and data loader constructor
- Parameters:
train_* (See other)
- Returns:
model, train_loader, test_loader
- torch_loop(model, train_loader, test_loader, args, param, config={'params': {}}, ids=None)[source]¶
Main training loop for all torch based models
- train_cutset(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None)[source]¶
Train cutset model
- Parameters:
train_* (See other)
- Returns:
Trained model
- train_flow(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None)[source]¶
Train normalizing flow (BNAF) neural model
- Parameters:
train_* (See other)
- Returns:
trained model
- train_flr(config={'params': {}}, data_trn=None, args=None, param=None)[source]¶
Train factorized likelihood model
- Parameters:
train_* (See other)
- Returns:
trained model
- train_graph_xgb(config={'params': {}}, data_trn=None, data_val=None, trn_weights=None, val_weights=None, args=None, y_soft=None, param=None, feature_names=None)[source]¶
Train graph model + xgb hybrid model
- Parameters:
train_* (See other)
- Returns:
trained model
- train_torch_generic(X_trn=None, Y_trn=None, X_val=None, Y_val=None, trn_weights=None, val_weights=None, X_trn_2D=None, X_val_2D=None, args=None, param=None, Y_trn_DA=None, trn_weights_DA=None, Y_val_DA=None, val_weights_DA=None, y_soft=None, data_trn_MI=None, data_val_MI=None, ids=None, config={'params': {}})[source]¶
Train generic neural model [R^d x (2D) -> output]
- Parameters:
train_* (See other)
- Returns:
trained model
- train_torch_graph(config={'params': {}}, data_trn=None, data_val=None, args=None, param=None, y_soft=None)[source]¶
Train graph neural networks
- Parameters:
config – raytune parameter dict
data_trn – training data
data_val – validation data
args – arg parameters dict
param – model parameters dict
- Returns:
trained model
icenet.deep.vae¶
- class Decoder(D, latent_dim=32, hidden_dim=[128, 128], activation='tanh', batch_norm=False, dropout=0.0)[source]¶
- forward(z)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class Encoder(D, hidden_dim=[128, 128], latent_dim=32, activation='tanh', batch_norm=False, dropout=0.0)[source]¶
Non-variational encoder
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- class VAE(D, latent_dim, encoder_hidden_dim=[128, 128], var_hidden_dim=[128], decoder_hidden_dim=[128, 128], encoder_bn=True, encoder_act='relu', encoder_dropout=0.0, decoder_bn=False, decoder_act='relu', decoder_dropout=0.0, reco_prob='Gaussian', kl_prob='Gaussian', anomaly_score='KL_RECO', C=None, **kwargs)[source]¶
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- kl_div(z, mu, std)[source]¶
KL divergence (always positive), taken against a diagonal multivariate normal here
log q(z|x) - log p(z)
- class VariationalEncoder(D, hidden_dim=[128, 128], var_hidden_dim=[128, 64], latent_dim=32, activation='relu', batch_norm=False, dropout=0.0)[source]¶
Variational encoder
- forward(x)[source]¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
icenet.optim¶
Custom optimization functions.
icenet.optim.adam¶
icenet.optim.adamax¶
- class Adamax(params, lr=0.002, betas=(0.9, 0.999), eps=1e-08, weight_decay=0, polyak=0)[source]¶
- step(closure=None)[source]¶
Performs a single optimization step (parameter update).
- Parameters:
closure (Callable) – A closure that reevaluates the model and returns the loss. Optional for most optimizers.
Note
Unless otherwise specified, this function should not modify the
.grad
field of the parameters.
icenet.optim.scheduler¶
icenet.tools¶
Tool and auxiliary functions.
icenet.tools.aux_torch¶
- load_torch_checkpoint(path='/', label='mynet', epoch=-1)[source]¶
Load pytorch checkpoint
- Parameters:
path – folder path
label – model label name
epoch – epoch index. Use -1 for the last epoch
- Returns:
pytorch model
icenet.tools.aux¶
- class Metric(y_true, y_pred, weights=None, class_ids=[0, 1], hist=True, valrange='prob', N_mva_bins=30, verbose=True, num_bootstrap=0, exclude_neg_class=True)[source]¶
Classifier performance evaluation metrics.
- ak2numpy(x: Array, fields: list, null_value: float = -999.0, dtype='float32')[source]¶
Unzip awkward array to numpy array per column (awkward Record)
- Parameters:
x – awkward array
fields – record field names to extract
null_value – missing element value
dtype – final numpy array dtype
- Returns:
numpy array with columns ordered as ‘fields’ parameter
- arrays2matrix(x_arr, y_arr, z_arr, x_binedges, y_binedges, dtype='float32')[source]¶
Array representation summed to matrix.
- Parameters:
x_arr – array of [x values]
y_arr – array of [y values]
z_arr – array of [z values]
x_binedges – array of binedges
y_binedges – array of binedges
- Returns:
Matrix output
- auc_score(fpr, tpr)[source]¶
AUC-ROC via numerical intergration
- Parameters:
fpr – false positive rate array
tpr – true positive rate array
Call sort_fpr_tpr before this function for numerical stability.
- Returns:
AUC score
- binvec2powersetindex(X, B)[source]¶
Binary vector to powerset index.
- Parameters:
X – matrix of binary vectors [# number of vectors x dimension]
B – the powerset matrix
- Returns:
array of powerset indices
- Return type:
y
- binvec_are_equal(a, b)[source]¶
Compare equality of two binary vectors a and b.
- Parameters:
a – binary vectors
b – binary vectors
- Returns
true or false
- cartesian_product(*arrays)[source]¶
N-dimensional generalized cartesian product between arrays
- Parameters:
*arrays – a list of numpy arrays
Example
cartesian_product(*[np.array([1,2,3]), np.array([100,200,500])])
- concatenate_and_clean(array_list: list, axis: int = 0)[source]¶
Concatenate a list of arrays and clean memory
- Parameters:
array_list – a list of Awkward or numpy arrays
- Returns:
concatenated array
- count_targets(events, ids, entry_start=0, entry_stop=None, new=False, library='np')[source]¶
Targets statistics printout
- Parameters:
events – uproot object
ids – list of branch identifiers
entrystart – uproot starting point
entrystop – uproot ending point
- Returns:
Printout on stdout
- create_model_filename(path: str, label: str, filetype='.dat', epoch: int = -1)[source]¶
Create model filename based on a set of epoch files in a path.
This function automatically takes the minimum validation loss epoch / iteration
- if epoch == - 1, we try to find the best validation loss model
epoch == - 2, we take the latest epoch epoch == N, we take the specific epoch
- create_model_filename_xgb(path: str, label: str, filetype='.dat', epoch: int = -1)[source]¶
Create model filename with xgboost where we have saved only the last epoch boost which contains all the epochs.
This function automatically takes the minimum validation loss epoch / iteration
- if epoch == - 1, we try to find the best validation loss model
epoch == - 2, we take the latest epoch epoch == N, we take the specific epoch
- explicit_range(entry_start, entry_stop, num_entries)[source]¶
Clean None from entry_start and entry_stop
- generatebinary(N, M=None, verbose=False)[source]¶
Function to generate all 2**N binary vectors (as boolean matrix rows) with 1 <= M <= N number of ones (hot bits) (default N)
- int2onehot(Y, num_classes)[source]¶
Integer class vector to class “one-hot encoding”
- Parameters:
Y – Class indices (# samples)
num_classes – Number of classes
- Returns:
Onehot representation
- Return type:
onehot
- jagged2matrix(arr, scalar_vars, jagged_vars, jagged_dim, entry_start=None, entry_stop=None, null_value: float = -999.0, mode: str = 'columnar', dtype='float32')[source]¶
Transform a “jagged” event container to a matrix (rows ~ event, columns ~ variables)
- Parameters:
arr – Awkward array type input for N events
scalar_vars – Scalar variables to pick (list of strings)
jagged_vars – Jagged variables to pick (list of strings)
jagged_dim – Maximum dimension per jagged variable (integer array)
null_value – Default value for empty ([]) jagged entries
- Returns:
Fixed dimensional 2D-numpy matrix (N x [# scalar var x {#jagged var x maxdim}_i])
- jagged2tensor(X, ids, xyz, x_binedges, y_binedges, dtype='float32')[source]¶
- Parameters:
X – input data (samples x dimensions) with jagged structure
ids – all variable names
xyz – array of (x,y,z) channel triplet strings such as [[‘image_clu_eta’, ‘image_clu_phi’, ‘image_clu_e’]]
x_binedges
y_binedges – arrays of bin edges
- Returns:
tensor of size (samples x channels x rows x columns)
- Return type:
T
- jagged_ak_to_numpy(arr, scalar_vars, jagged_vars, jagged_maxdim, entry_start=None, entry_stop=None, null_value: float = -999.0, dtype='float32')[source]¶
Transform jagged awkward array to fixed dimensional numpy data
- Parameters:
arr – jagged awkward array
scalar_vars – Scalar variable names
jagged_vars – Jagged variable names
jagged_maxdim – Maximum dimension per jagged category
null_value – Fill null value
- Returns:
numpy array, ids
- longvec2matrix(X, M, D, order='F')[source]¶
A matrix representation / dimension converter function useful e.g. for DeepSets and similar neural architectures.
- Parameters:
X – Numpy input matrix (2-dim) (N x [MD])
M – Number of set elements
D – Feature dimension
order – Reshape direction
- Returns:
Output matrix (3-dim) (N x M x D)
- los2lol(listOsets)[source]¶
Convert a list of sets [{},{},..,{}] to a list of of lists [[], [], …, []].
- merge_connected(lists)[source]¶
Merge sets with common elements (find connected graphs problem).
Examples
Input: [{0, 1}, {0, 1}, {2, 3}, {2, 3}, {4, 5}, {4, 5}, {6, 7}, {6, 7}, {8, 9}, {8, 9}, {10}, {11}] Output: [{0, 1}, {2, 3}, {4, 5}, {6, 7}, {8, 9}, {10}, {11}]
- multiclass_roc_auc_score(y_true, y_pred, weights=None, average='macro')[source]¶
Multiclass AUC (area under the curve).
- Parameters:
y_true – True classifications
y_pred – Soft probabilities per class
weights – Sample weights
average – Averaging strategy
- Returns:
Area under the curve via averaging
- Return type:
auc
- number_of_set_bits(i)[source]¶
Return how many bits are active of an integer in a standard binary representation.
- pick_ind(x, minmax)[source]¶
Return indices between minmax[0] <= x < minmax[1], i.e. [a,b)
- Parameters:
x – Input vector
minmax – Minimum and maximum values
- Returns:
indices
- pick_index(all_ids: list, vars: list)[source]¶
Return indices in all_ids corresponding to vars
(vars can contain regexp)
- Parameters:
all_ids – list of strings, e.g. [‘a’,’b’,’c’]
vars – list of string to pick, e.g. [‘a’, ‘c’] or [‘.*’]
- Returns:
index array, variable names list
- process_regexp_ids(all_ids, ids=None)[source]¶
Process regular expressions for variable names
- Parameters:
all_ids – all keys in a tree
ids – keys to pick, if None, use all keys
- Returns:
ids matching regular expressions
- recursive_concatenate(array_list, max_batch_size: int = 32, axis: int = 0)[source]¶
Concatenate a list of arrays in a recursive way (to avoid possible problems with one big concatenation e.g. with Awkward)
- Parameters:
array_list – a list of Awkward or Numpy arrays
max_batch_size – maximum number of list elements per concatenation
axis – axis to concatenate over
- Returns:
concatenated array
- red(X, ids, param, mode=None, exclude_tag='exclude_MVA_vars', include_tag='include_MVA_vars', verbose=True)[source]¶
Reduce the input set variables of X (start with all include, then evaluate exclude, then evaluate include)
Remember that using python sets() is not necessarily stable over runs ! (do not rely on the order of sets)
- Parameters:
X – data matrix
ids – names of columns
param – parameter dictionary (from yaml)
mode – return mode ‘X’ or ‘ids’
exclude_tag – key in param
include_tag – key in param
- slice_range(start, stop, N)[source]¶
Python slice type processor function
- Parameters:
start – first index
stop – end index + 1
N – total number of indices
- Returns:
processed indices and total length
- Return type:
a,b,b-a
- split(a, n)[source]¶
Generator which returns approx equally sized chunks. :param a: Total number :param n: Number of chunks
Example
list(split(10, 3))
- split_start_end(a, n, end_plus=1)[source]¶
Returns approx equally sized chunks.
- Parameters:
a – Range, define with range()
n – Number of chunks
end_plus – Python/nympy index style (i.e. + 1 for the end)
Examples
split_start_end(range(100), 3) returns [[0, 34], [34, 67], [67, 100]] split_start_end(range(5,25), 3) returns [[5, 12], [12, 19], [19, 25]]
- to_edges(l)[source]¶
treat l as a Graph and returns it’s edges
Examples
to_edges([‘a’,’b’,’c’,’d’]) -> [(a,b), (b,c),(c,d)]
- unroll_ak_fields(x, order='first')[source]¶
Unroll field names in a (nested) awkward array
- Parameters:
x – awkward array
type – return first order and second order field names
- Returns:
field names as a list
icenet.tools.icemap¶
icenet.tools.iceroot¶
- events_to_jagged_numpy(events, ids, entry_start=0, entry_stop=None, maxevents=None, label=None)[source]¶
Process uproot tree to a jagged numpy (object) array
- Parameters:
events – uproot tree
ids – names of the variables to pick
entry_start – first event to consider
entry_stop – last event to consider
- Returns:
X
- get_num_events(rootfile, key_index=0)[source]¶
Get the number of entries in a rootfile by reading a key
- Parameters:
rootfile – rootfile string (with possible Tree name appended with :)
key_index – which variable use as a dummy
- Returns:
number of entries
- load_tree(rootfile, tree, entry_start=0, entry_stop=None, maxevents=None, ids=None, library='np', dtype=None, num_cpus=0, verbose=False)[source]¶
Load ROOT files
- Parameters:
rootfile – Name of root file paths (string or a list of strings)
tree – Tree to read out
entry_start – First event to read per file
entry_stop – Last event to read per file
maxevents – Maximum number of events in total (over all files)
ids – Names of the variables to read out from the root tree
library – Return type ‘np’ (jagged numpy) or ‘ak’ (awkward) of the array
num_cpus – Number of processes used (set 0 for automatic)
verbose – Verbose output
- Returns:
array of type ‘library’
- load_tree_stats(rootfile, tree, key=None, verbose=False)[source]¶
Load the number of events in a list of rootfiles
- Parameters:
rootfile – a list of rootfiles
tree – tree name to open
key – key (variable name) to use to get the number of events, if None then use the first one
verbose – verbose output print
- Returns:
number of events
- read_multiple(process_func, processes, root_path, param, class_id, dtype=None, num_cpus=0, verbose=False)[source]¶
Loop over different MC / data processes as defined in the yaml files
- Parameters:
process_func – data processing function
processes – MC processes dictionary (from yaml)
root_path – main path of files
param – parameters of ‘process_func’
class_id – class identifier (integer), e.g. 0, 1, 2 …
num_cpus – number of CPUs used (set 0 for automatic)
verbose – verbose output print
- Returns:
X, Y, W, ids, info (awkward array format)
- read_single(process_func, process, root_path, param, class_id, dtype=None, num_cpus=0, verbose=False)[source]¶
Loop over different MC / data processes as defined in the yaml files
[awkward compatible only]
- Parameters:
process_func – data processing function
process – MC / data process dictionary (from yaml)
root_path – main path of files
param – parameters of ‘process_func’
class_id – class identifier (integer), e.g. 0, 1, 2 …
num_cpus – number of CPUs used (set 0 for automatic)
verbose – verbose output print
- Returns:
X, Y, W, ids, info (awkward array format)
icenet.tools.icevec¶
- class vec4(x=None, y=None, z=None, t=None)[source]¶
Lorentz vectors
- property abseta¶
- property beta¶
- boost(b, sign=-1)[source]¶
Lorentz boost :param b: Boost 4-momentum (e.g. system)
sign : 1 or -1 (direction of the boost, into the rest (-1) or out (1))
- property costheta¶
- property e¶
- property eta¶
- property gamma¶
- property m¶
- property m2¶
- property mt¶
- property p3¶
- property p3mod¶
- property p3mod2¶
- property phi¶
- property pt¶
- property pt2¶
- property px¶
- property py¶
- property pz¶
- property rapidity¶
- property t¶
- property theta¶
- property x¶
- property y¶
- property z¶
icenet.tools.io¶
- class IceXYW(x=array([], dtype=float64), y=array([], dtype=float64), w=None, ids=None)[source]¶
- Parameters:
x – data object
y – target output data
w – weights
- calc_madscore(X: array)[source]¶
Calculate robust normalization.
- Parameters:
X – Input with [# vectors x # dimensions]
- Returns:
Median vector X_mad : Median deviation vector
- Return type:
X_m
- calc_zscore(X: array, weights: array | None = None)[source]¶
Calculate 0-mean & unit-variance normalization.
- Parameters:
X – Input with [N x dim]
weights – Event weights
- Returns:
Mean vector X_std : Standard deviation vector
- Return type:
X_mu
- calc_zscore_tensor(T)[source]¶
Compute z-score normalization for tensors. :param T: input tensor data (events x channels x rows x cols, …)
- Returns:
mu, std tensors
- get_gpu_memory_map()[source]¶
Get the GPU VRAM use in GB.
- Returns:
dictionary with keys as device ids [integers] and values the memory used by the GPU.
- glob_expand_files(datasets, datapath, recursive_glob=False)[source]¶
Do global / brace expansion of files
- Parameters:
datasets – dataset filename with glob syntax (can be a list of files)
datapath – root path to files
- Returns:
full filenames including the path
- Return type:
files
- impute_data(X, imputer=None, dim=None, values=[-999], labels=None, algorithm='iterative', fill_value=0, knn_k=6)[source]¶
Data imputation (treatment of missing values, Nan and Inf).
NOTE: This function can impute only fixed dimensional input currently (not Jagged numpy arrays)
- Parameters:
X – Input data matrix [N vectors x D dimensions]
imputer – Pre-trained imputator, default None
dim – Array of active dimensions to impute
values – List of special integer values indicating the need for imputation
labels – List containing textual description of input variables
algorithm – ‘constant’, mean’, ‘median’, ‘iterative’, knn_k’
knn_k – knn k-nearest neighbour parameter
- Returns:
Imputed output data
- Return type:
X
- make_hash_sha256_object(o)[source]¶
Create SHA256 hash from an object
- Parameters:
o – python object (e.g. dictionary)
- Returns:
hash
- pick_vars(data, set_of_vars)[source]¶
Choose the active set of input variables.
- Parameters:
data – IceXYW type object
set_of_vars – Variables to pick
- Returns:
Chosen indices newvars: Chosen variables
- Return type:
newind
- split_data(X, Y, W, ids, frac=[0.5, 0.1, 0.4], permute=True)[source]¶
Split machine learning data into train, validation, test sets
- Parameters:
X – data matrix
Y – target matrix
W – weight matrix
ids – variable names of columns
frac – fraction [train, validate, evaluate] (sum to 1)
rngseed – random seed
icenet.tools.plots¶
- MVA_plot(metrics, labels, title='', filename='MVA', density=True, legend_fontsize=7)[source]¶
MVA output plots
- ROC_plot(metrics, labels, title='', plot_thresholds=True, thr_points_signal=[0.05, 0.1, 0.15, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.85, 0.9, 0.95], filename='ROC', legend_fontsize=7, xmin=0.0001, alpha=0.32)[source]¶
Receiver Operating Characteristics i.e. False positive (x) vs True positive (y) :param metrics: :param labels: :param title: :param plot_thresholds: :param thr_points_signal: :param filename: :param legend_fontsize: :param xmin: :param alpha:
- annotate_heatmap(X, ax, xlabels, ylabels, x_rot=90, y_rot=0, decimals=1, color='w')[source]¶
Add text annotations to a matrix heatmap plot
- binengine(bindef, x)[source]¶
Binning processor function
- Parameters:
bindef – binning definition
x – data input array
Examples
50 (number of bins, integer) [1.0, 40.0, 50.0] (list of explicit edges) {‘nbin’: 30, ‘q’: [0.0, 0.95], ‘space’: ‘linear’} (automatic with quantiles) {‘nbin’: 30, ‘minmax’: [2.0, 50.0], ‘space’: ‘log10’} (automatic with boundaries)
- Returns:
binning edges
- Return type:
edges
- binned_1D_AUC(y_pred, y, X_kin, ids_kin, X, ids, edges, weights=None, VAR: str = 'trk_pt', num_bootstrap=0)[source]¶
Evaluate AUC & ROC per 1D-bin.
- Parameters:
y_pred – MVA algorithm output
y – Output (truth level target) data
X_kin – Data
ids_X_kin – Variables (strings)
X – Data
ids – Variables (strings)
edges – Edges of the space cells
weights – Sample weights
VAR – Variable identifier to pick (one)
- Returns:
Figure handle and axis met : Metrics object
- Return type:
fig,ax
- binned_2D_AUC(y_pred, y, X_kin, ids_kin, X, ids, edges, label, weights=None, VAR: list = ['trk_pt', 'trk_eta'])[source]¶
Evaluate AUC per 2D-bin.
- Parameters:
y_pred – MVA algorithm output
y – Output (truth level target) data
X_kin – Data
ids_kin – Variables
X – Data
ids – Variables
edges – Edges of the A,B-space cells (2D array)
label – Label of the classifier (string)
weights – Sample weights
VAR – Variable identifiers (two)
- Returns:
Figure handle and axis met : Metrics object
- Return type:
fig,ax
- density_COR(y_pred, X, ids, label, weights=None, hist_edges=[[50], [50]], path='', cmap='Oranges')[source]¶
Evaluate the 2D-density of the MVA algorithm output vs other variables.
- Parameters:
y_pred – MVA algorithm output
X – Variables to be plotted
ids – Identifiers of the variables in X
label – Label of the MVA model (string)
weights – Sample weights
hist_edges – Histogram edges list (or number of bins, as an alternative) (2D)
path – Save path
cmap – Color map
- Returns:
Plot pdf saved directly
- density_COR_wclass(y_pred, y, X, ids, label, weights=None, class_ids=None, edges=[[50], [50]], density=True, path='', cmap='Oranges', **kwargs)[source]¶
Evaluate the 2D-density of the MVA algorithm output vs other variables per class.
- Parameters:
y_pred – MVA algorithm output
y – Output (truth level target) data
X – Variables to be plotted
ids – Identifiers of the variables in X
label – Label of the MVA model (string)
weights – Sample weights
class__ids – Class ids to plot
hist_edges – Histogram edges list (or number of bins, as an alternative) (2D)
density – Normalize to density
path – Save path
cmap – Color map
- Returns:
correlation values in a dictionary (per variable, per class) plots are saved directly
- density_MVA_wclass(y_pred, y, label, weights=None, class_ids=None, edges=80, path='', **kwargs)[source]¶
Evaluate MVA output (1D) density per class.
- Parameters:
y_pred – MVA algorithm output
y – Output (truth level target) data
label – Label of the MVA model (string)
weights – Sample weights
class_ids – Class IDs to plot
hist_edges – Histogram edges list (or number of bins, as an alternative)
- Returns:
Plot pdf saved directly
- draw_error_band(ax, x, y, x_err, y_err, **kwargs)[source]¶
Calculate normals via centered finite differences (except the first point which uses a forward difference and the last point which uses a backward difference).
https://matplotlib.org/stable/gallery/lines_bars_and_markers/curve_error_band.html
- plot_AIRW(X, y, ids, weights, y_pred, pick_ind, label, sublabel, param, tau=1.0, targetdir=None, num_cpus=0)[source]¶
Plot AI based reweighting results
- plot_AUC_matrix(AUC, edges_A, edges_B)[source]¶
Plot AUC matrix.
- Parameters:
AUC – AUC-ROC matrix
edges_A – Histogram edges of variable A
edges_B – Histogram edges of variable B
- Returns:
figure handle ax: figure axis
- Return type:
fig
- plot_contour_grid(pred_func, X, y, ids, targetdir='.', transform='numpy', reso=50, npoints=400)[source]¶
Classifier decision contour evaluated pairwise for each dimension, other dimensions evaluated at zero=(0,0,…0) (thus z-normalized with 0-mean is suitable)
- Parameters:
pred_func – prediction function handle
X – input matrix
y – class targets
ids – variable label strings
targetdir – output directory
transform – ‘numpy’, ‘torch’
reso – evaluation resolution
npoints – number of points to draw
- plot_correlation_comparison(corr_mstats, targetdir, xlim=None)[source]¶
Plot collected correlation metrics from density_COR_wclass()
- Parameters:
corr_mstats – statistics dictionary
targetdir – output directory
xlim – plot limits dictionary per class
- Returns:
plots saved to a directory
- plot_correlations(X, ids, weights=None, y=None, round_threshold=0.0, targetdir=None, colorbar=False)[source]¶
Plot a cross-correlation matrix of vector data
- Parameters:
X – Data matrix (N x D)
ids – Variable names (list of length D)
weights – Event weights
y – Class labels per event (list of length N)
round_threshold – Correlation matrix |C| < threshold to set matrix elements to zero
targetdir – Output plot directory
colorbar – Colorbar on the plot
- Returns:
Figures, axes (per class)
- Return type:
figs, axs
- plot_matrix(XY, x_bins, y_bins, vmin=0, vmax=None, cmap='RdBu', figsize=(4, 3), grid_on=False)[source]¶
Visualize matrix.
- plot_reweight_result(X, y, nbins, binrange, weights, title='', xlabel='x', linewidth=1.5, plot_unweighted=True)[source]¶
Here plot pure event counts so we see that also integrated class fractions are equalized (or not) after weighting!
- plot_selection(X, mask, ids, plotdir, label, varlist, density=True, library='np')[source]¶
Plot selection before / after type histograms against all chosen variables
- Parameters:
X – data array (N events x D dimensions)
mask – boolean selection indices (N)
ids – variable string array (D)
plotdir – plotting directory
label – a string label
varlist – a list of variables to be plotted (from ids)
density – normalize all histograms to unit density
library – ‘np’ or ‘ak’
- plot_train_evolution_multi(losses, trn_aucs, val_aucs, label, aspect=0.85, yscale='linear', xscale='linear')[source]¶
Training evolution plots.
- Parameters:
losses – loss values in a dictionary
trn_aucs – training metrics
val_aucs – validation metrics
- Returns:
figure handle ax: figure axis
- Return type:
fig
- plot_xgb_importance(model, tick_label, importance_type='gain', label=None, sort=True, default_names=False)[source]¶
Plot XGBoost model feature importance
- Parameters:
model – xgboost model object
dim – feature space dimension
tick_label – feature names
importance_type – type of importance metric [‘weight’, ‘gain’, ‘cover’, ‘total_gain’, ‘total_cover’]
default_names – True for xgboost default, else set False (uses tick_label)
- Returns
fig, ax
- plotvar(x, y, var, weights, nbins=70, percentile_range=[0.5, 99.5], plot_unweighted=True, title='', targetdir='.')[source]¶
Plot a single variable.
icenet.tools.prints¶
- colored_row(x, active_color='green', inactive_color='white', threshold=0.5, **kwargs)[source]¶
Color vector elements.
- print_colored_matrix(X, **kwargs)[source]¶
Print matrix with two colors (suitable for binary matrices).
- print_variables(X: array, ids: List[str], W=None, exclude_vals=None, output_file=None)[source]¶
Print statistics of X
- Parameters:
X – array (n x dim)
ids – variable names (dim)
W – event weights
exclude_vals – exclude special values from the stats
- Returns:
prettyprint table of stats
icenet.tools.process¶
- combine_pickle_data(args)[source]¶
Load splitted pickle data and return full dataset arrays
- Parameters:
args – main argument dictionary
- concatenate_data(data, max_batch_size: int = 32)[source]¶
Helper function to concatenate arrays with a specified maximum batch size
- evaluate_models(data=None, info=None, args=None)[source]¶
Evaluate ML/AI models.
- Parameters:
objects (Different datatype)
- Returns:
Saves evaluation plots to the disk
- generic_flow(rootname, func_loader, func_factor)[source]¶
Generic (read data – train models – evaluate models) workflow
- Parameters:
rootname – name of the workflow config folder
func_loader – data loader (function handle)
func_factor – data transformer (function handle)
- impute_datasets(data, args, features=None, imputer=None)[source]¶
Dataset imputation
- Parameters:
data – .x, .y, .w, .ids type object
args – imputer parameters
features – variables to impute (list), if None, then all are considered
imputer – imputer object (scikit-type)
- Returns:
imputed data
- plot_XYZ_wrap(func_predict, x_input, y, weights, label, targetdir, args, X_kin, ids_kin, X_RAW, ids_RAW)[source]¶
Arbitrary plot steering function. Add new plot types here, steered from plots.yml
- process_data(args, data, func_factor, mvavars, runmode)[source]¶
Process data to high level representations and split to train/eval/test
- process_raw_data(args, func_loader)[source]¶
Load raw input from the disk – this is executed only by ‘genesis’
- read_config(config_path='configs/xyz/', runmode='all')[source]¶
Commandline and YAML configuration reader
icenet.tools.raytools¶
- class ProgressBar(total: int, description: str = '')[source]¶
- property actor: ActorHandle¶
Returns a reference to the remote ProgressBarActor.
When you complete tasks, call update on the actor.
- pbar: tqdm¶
- print_until_done() None [source]¶
Blocking call.
Do this after starting a series of remote Ray tasks, to which you’ve passed the actor handle. Each of them calls update on the actor. When the progress meter reaches 100%, this method returns.
- progress_actor: ActorHandle¶
icenet.tools.reweight¶
- AIRW_helper(x, y, w, ids, pdf, args, x_val, y_val, w_val, EPS=1e-12)[source]¶
Helper function for ML based reweighting
- balanceweights(weights_doublet, reference_class, y, EPS=1e-12)[source]¶
Balance N-class weights to sum to equal counts.
- Parameters:
weights_doublet – N-class event weights (events x classes)
reference_class – which class gives the reference (integer)
y – class targets
- Returns:
weights doublet with new weights per event
- compute_ND_reweights(x, y, w, ids, args, pdf=None, EPS=1e-12, x_val=None, y_val=None, w_val=None, skip_reweights=False)[source]¶
Compute N-dim reweighting coefficients
Supports ‘ML’ (ND), ‘pseudo-ND’ (1D x 1D … x 1D), ‘2D’, ‘1D’
For ‘args’ dictionary structure, see steering cards.
- Parameters:
x – training sample input
y – training sample (class) labels
w – training sample weights
ids – variable names of columns of x
pdf – pre-computed pdfs (default None)
args – reweighting parameters in a dictionary
- Returns:
1D-array of re-weights pdf : computed pdfs
- Return type:
weights
- histogram_helper(x, y, w, ids, pdf, args, EPS)[source]¶
Helper function for histogram based reweighting
- reweightcoeff1D(X, y, pdf, reference_class, max_reg=1000.0, EPS=1e-12)[source]¶
Compute N-class density reweighting coefficients.
- Parameters:
X – Observable of interest (N x 1)
y – Class labels (N x 1)
pdf – PDF for each class
reference_class – e.g. 0 (background) or 1 (signal)
- Returns:
weights for each event
- reweightcoeff2D(X_A, X_B, y, pdf, reference_class, max_reg=1000.0, EPS=1e-12)[source]¶
Compute N-class density reweighting coefficients.
Operates in full 2D without any factorization.
- Parameters:
X_A – First observable of interest (N x 1)
X_B – Second observable of interest (N x 1)
y – Class labels (N x 1)
pdf – Density histograms for each class
reference_class – e.g. Background (0) or signal (1)
max_reg – Regularize the maximum reweight coefficient
- Returns:
weights for each event
icenet.tools.stx¶
- apply_cutflow(cut, names, xcorr_flow=True, EPS=1e-12)[source]¶
Apply cutflow
- Parameters:
cut – list of pre-calculated cuts, each list element is a boolean array
names – list of names (description of each cut, for printout only)
xcorr_flow – compute full N-point correlations
return_powerset – return each of 2**|cuts| as a separate boolean mask vector
- Returns:
boolean mask of size number of events (1 = pass, 0 = fail)
- Return type:
mask
- construct_columnar_cuts(X, ids, cutlist)[source]¶
Construct cuts and corresponding names.
- Parameters:
X – Input columnar data matrix
ids – Variable names for each column of X
cutlist – Selection cuts as strings, such as [‘ABS@eta < 0.5’, ‘trgbit == True’]
- Returns:
masks (boolean arrays) in a list, boolean expressions (list)
- construct_exptree(root)[source]¶
Construct an expression (syntax) tree via recursion.
- Parameters:
root – List of lists, for example [[‘10’, ‘>’, ‘7’], ‘AND’, [[‘4’, ‘>=’, ‘2’], ‘AND’, [‘2’, ‘<=’, ‘4’]]]
- Returns:
an expression tree object with ‘tree_node’ objects
- eval_boolean_exptree(root, X, ids)[source]¶
Evaluation of a (boolean) expression tree via recursion.
- Parameters:
root – expression tree object
X – data matrix (N events x D dimensions)
ids – variable names for each D dimension
- Returns:
boolean selection list of size N
- eval_boolean_syntax(expr, X, ids, verbose=False)[source]¶
A complete wrapper to evaluate boolean syntax.
- Parameters:
expr – boolean expression string, e.g. “pt > 7.0 AND (x < 2 OR x >= 4)”
X – input data (N x dimensions)
ids – variable names as a list
- Returns:
boolean list of size N
- filter_constructor(filters, X, ids, y=None)[source]¶
Filter product main constructor
- Parameters:
filters – yaml file input
X – columnar input data
ids – data column keys (list of strings)
y – class labels (default None), used for diplomat (always passing) classes
- Returns:
mask matrix, mask text labels (list), mask path labels (list)
- parse_boolean_exptree(instring)[source]¶
A boolean expression tree parser.
- Parameters:
instring – input string, e.g. “pt > 7.0 AND (x < 2 OR x >= 4)”
- Returns:
A syntax tree as a list of lists
- Information:
- See: https://stackoverflow.com/questions/11133339/
parsing-a-complex-logical-expression-in-pyparsing-in-a-binary-tree-fashion
- powerset_constructor(cutset, X, ids)[source]¶
Powerset (all subsets of boolean combinations) filter constructor
- Returns:
mask matrix, mask text labels (list), mask path labels (list)
- powerset_cutmask(cut)[source]¶
Generate powerset 2**|cuts| masks
- Parameters:
cut – list of pre-calculated cuts, each list element is a boolean array
- Returns:
(2**|cuts| x num_events) sized boolean mask matrix
- Return type:
mask
- print_exptree(root)[source]¶
Print out an expression tree object via recursion.
- Parameters:
root – expression tree (object type)
- print_parallel_cutflow(masks, names, EPS=1e-12)[source]¶
Print boolean combination cutflow statistics
- Parameters:
cut – list of pre-calculated cuts, each list element is a boolean array with size of num of events
names – list of names (description of each cut, for printout only)
- print_stats(mask, text)[source]¶
Print filter mask category statistics
- Parameters:
mask – computed event filter mask
text – filter descriptions