icefit

Binned and unbinned fitting functions, neural methods and statistical tools.

icefit.abcd

ABCD_2NLL(B, C, D, mu, mu_B, mu_D, EPS=1e-32)[source]

ABCD estimators negative log-likelihood function

DEFINITION: A = B x C / D

Parameters:
  • B – Measured event counts

  • C – Measured event counts

  • D – Measured event counts

  • mu – Free parameters of the likelihood function

  • mu_B – Free parameters of the likelihood function

  • mu_D – Free parameters of the likelihood function

  • parameters) ((N.B. here number of measurements == number of free)

Model relation:
Ntot = mu*mu_B + mu_B + mu*mu_D + mu_D

= (A) + (B) + (C) + (D)

See e.g. https://twiki.cern.ch/twiki/pub/Main/ABCDMethod/ABCDGuide_draft18Oct18.pdf

Returns:

-2logL

ABCD_eq(b, c, d)[source]

Basic estimator formula for count in ‘A’ (signal domain)

Independent (X,Y) gives us the definition A = B x C / D

Y |C | A |—– |D | B 0——> X

ABCD_err(b, c, d, method='errorprop', N=100000, alpha=0.32, lrange=5)[source]

ABCD uncertainty (confidence interval) methods

DEFINITION: A = B x C / D

Parameters:
  • b – input value

  • c – input value

  • d – input value

  • method – ‘errorprop’, ‘poisson’, ‘bootstrap’, ‘likelihood’

  • N – number of random samples for sampling based methods

  • alpha – confidence level (‘errorprop’ fixed at CL68)

  • lrange – likelihood scan range

Returns:

confidence interval array (lower,upper)

test_abcd()[source]

icefit.commutator

crv(size, upper_L)[source]
main()[source]
selection_A(X, Y, Y_CUT)[source]

Per event: Cut on Y and then get the min(X) object

selection_B(X, Y, Y_CUT)[source]

Per event: Get the min(X) object and then cut on Y

icefit.cortools

H_score(p: float, EPS: float = 1e-15)[source]

Shannon Entropy (log_e ~ nats units)

Parameters:

p – probability vector

Returns:

entropy

I_score(C: ndarray, normalized: str | None = None, EPS: float = 1e-15)[source]

Mutual information score (log_e ~ nats units)

Parameters:
  • C – (X,Y) 2D-histogram array with positive definite event counts

  • normalized – return normalized version (None, ‘additive’, ‘multiplicative’)

Returns:

mutual information score

bw2bins(x: ndarray, bw: float, alpha: float)[source]

Convert a histogram binwidth to number of bins

Parameters:
  • x – data array

  • bw – binwidth

  • alpha – outlier percentile

Returns:

number of bins, if something fails return 1

corrcoeff_weighted_torch(x: Tensor, y: Tensor, weights: Tensor | None = None)[source]

Per event weighted linear Pearson 2x2 correlation matrix cf. torch.corrcoef()

(torch autograd compatible)

Parameters:
  • x – data (Nx1)

  • y – data (Nx1)

  • weights – event weights

distance_corr(x: ndarray, y: ndarray, weights: ndarray | None = None, alpha: float = 0.32, n_bootstrap: int = 30)[source]

Distance correlation https://en.wikipedia.org/wiki/Distance_correlation

Parameters:
  • x – arrays of values

  • y – arrays of values

  • weights – possible event weights

  • alpha – confidence interval [alpha/2, 1-alpha/2] level

  • n_bootstrap – number of percentile bootstrap samples

Returns:

correlation coefficient [0,1], confidence interval

distance_corr_torch(x: Tensor, y: Tensor, weights: Tensor | None = None, p: float = 2.0)[source]

Distance Correlation, complexity O(N^2)

(torch autograd compatible)

https://en.wikipedia.org/wiki/Distance_correlation

See: https://dcor.readthedocs.io/en/latest/theory.html

Parameters:
  • x – data (N x dim)

  • y – data (N x dim), where dim can be the same or different than for x

  • weights – event weights (not implemented)

  • p – p-norm [0,inf] value

freedman_diaconis_bin(x: ndarray, mode: str = 'nbins', alpha: float = 0.01)[source]

Freedman-Diaconis rule for a 1D-histogram bin width

D. Freedman & P. Diaconis (1981) “On the histogram as a density estimator: L2 theory”.

~ N**(-1/3)

Parameters:
  • x – array of 1D data

  • mode – return ‘width’ or ‘nbins’

  • alpha – outlier percentile

gaussian_mutual_information(rho: float)[source]

Analytical 2D-Gaussian mutual information using a correlation coefficient rho.

I(X1,X2) = H(X1) + H(X2) - H(X1,X2) :param rho: correlation coefficient between (-1,1)

Returns:

mutual information

hacine_entropy_bin(x: ndarray, mode: str = 'nbins', alpha: float = 0.01)[source]

Hacine-Gharbi et al. “Low Bias Histogram-Based Estimation of Mutual Information for Feature Selection.” Pattern Recognition Letters, 2012.

Parameters:

scott_bin() (See)

hacine_joint_entropy_bin(x: ndarray, rho: float, mode: str = 'nbins', alpha: float = 0.01)[source]

Hacine-Gharbi, Ravier. “A Binning Formula of Bi-histogram for Joint Entropy Estimation Using Mean Square Error Minimization.” Pattern Recognition Letters, 2018.

Parameters:

scott_bin() (See)

mutual_information(x: ndarray, y: ndarray, weights=None, bins_x=None, bins_y=None, normalized=None, alpha=0.32, n_bootstrap=300, automethod='Scott2D', minbins=5, maxbins=100, outlier=0.01)[source]

Mutual information entropy (non-linear measure of dependency) between x and y variables

Parameters:
  • x – array of values

  • y – array of values

  • weights – weights (default None)

  • bins_x – x binning array If None, then automatic.

  • bins_y – y binning array.

  • normalized – normalize the mutual information (see I_score() function)

  • n_bootstrap – number of percentile bootstrap samples

  • alpha – bootstrap confidence interval

Autobinning args:

automethod : ‘Hacine2D’, ‘Scott2D’ minbins : minimum number of bins per dimension outlier : outlier protection percentile

Returns:

mutual information, confidence interval

optbins(x: ndarray, maxM: int = 150, mode='nbins', alpha=0.025)[source]

NOTE: Weak performance, study the method !!

Optimal 1D-histogram binning via Bayesian Brute Force search algorithm.

K.H Knuth, 2012, Entropy. https://arxiv.org/abs/physics/0605197

Parameters:
  • x – data points

  • maxM – maximum number of bins

  • mode – “nbins” or “width”

  • alpha – outlier protection percentile

Returns:

optimal number of bins

optbins2d(x: ndarray, y: ndarray, maxM=(40, 40), mode='nbins', alpha=0.025)[source]

NOTE: Weak performance, study the method !!

Optimal 2D-histogram binning via Bayesian Brute Force search algorithm.

K.H Knuth, 2012, Entropy. https://arxiv.org/abs/physics/0605197

Parameters:
  • x – data points

  • y – data points

  • maxM – maximum number of bins per dimension

  • mode – “nbins” or “width”

  • alpha – outlier protection percentile

Returns:

optimal number of bins

pearson_corr(x: ndarray, y: ndarray, weights: ndarray | None = None, return_abs=False, alpha=0.32, n_bootstrap=300)[source]

Pearson Correlation Coefficient https://en.wikipedia.org/wiki/Pearson_correlation_coefficient

Parameters:
  • x – arrays of values

  • y – arrays of values

  • weights – possible event weights

  • return_abs – return absolute value

  • alpha – confidence interval [alpha/2, 1-alpha/2] level

  • n_bootstrap – number of percentile bootstrap samples

Returns:

correlation coefficient [-1,1], confidence interval, p-value

percentile_per_dim(x: ndarray, q: float, **kwargs)[source]

Compute percentiles for each dimension (column) over samples (rows)

Parameters:
  • input – (N x dim)

  • q – percentile (0,100)

Returns:

array with length dim

Return type:

out

prc_CI(x: ndarray, alpha: float)[source]
scott_bin(x: ndarray, rho: float, mode: str = 'nbins', alpha: float = 0.01)[source]

Scott rule for a 2D-histogram bin widths

Scott, D.W. (1992), Multivariate Density Estimation: Theory, Practice, and Visualization – 2D-Gaussian case

~ N**(-1/4)

Parameters:
  • x – array of 1D data (one dimension of the bivariate distribution)

  • rho – Linear correlation coefficient

  • mode – return ‘width’ or ‘nbins’

  • alpha – outlier percentile

test_constant(N=10000)[source]

Constant input unit test of the estimators.

test_gaussian()[source]

Gaussian unit test of the estimators.

icefit.dequantize

U(n, a, b)[source]

Uniform random numbers between [a,b]

compute_ind(x, theta)[source]

Index association helper

construct_icdf(theta, pvals, n_interp=10000, kind='cubic')[source]

Construct inverse CDF

fast_loop(i: int, k: int, N_buffer: int, x: ~numpy.ndarray, deq: ~numpy.ndarray, theta=<class 'numpy.ndarray'>, theta_ind=<class 'numpy.ndarray'>)[source]

Fast loop function used by iDQF

iDQF(x, theta=None, pvals=None, N_buffer=10000, n_interp=10000, kind='cubic')[source]

Dequantization via inverse CDF

Parameters:
  • x – data sample array (1D)

  • theta – (theoretical) quantized pdf x-values, if None, discrete pdf is estimated from x

  • pvals – (theoretical) quantized pdf y-values, -||-

  • N_buffer – technical buffer size

  • n_interp – number of interpolation points

  • kind – interpolation type, e.g. ‘linear’, ‘cubic’

Returns:

dequantized x values

main()[source]

icefit.em

EM_frac(pdf, iters=30, EPS=1e-12, verbose=True)[source]

EM-algorithm for unknown integrated class fractions

Parameters:
  • pdf – (n x K) density (pdf) values for n measurements, K classes

  • iter – Number of iterations

Returns:

Integrated class fractions

Return type:

frac

gausspdf(x, mu, sigma)[source]

Gaussian pdf

mixture_nll(pdf, frac)[source]

Mixture model negative Log-Likelihood

Parameters:
  • pdf – (n x K) density (pdf) values for n measurements, K classes

  • frac

    1. class fractions with sum 1

Returns:

Negative Log-Likelihood

Return type:

nll

test_EM()[source]

Test function to test EM iteration

icefit.icelimit

CL_single(mu, s_hypothesis, bg_expected, observed, num_toys_pdf=10000, s_syst_error=None, s_syst_type='flat', bg_syst_error=None, bg_syst_type='flat', use_CLs=True)[source]

One tailed signal upper limits using “toy” Monte Carlo.

A single channel counting experiment case with “quasi-Bayesian” style systematics convoluted in via MC.

Parameters:
  • mu – Signal strength parameter

  • s_hypothesis – Number of predicted signal events (with mu = 1)

  • bg_expected – Number of expected background events

  • observed – Number of observed events

  • num_toys_pdf – Number of MC samples to construct PDFs

  • use_CLs – Return “CLs” type limits

  • (s)bg_syst_error – Systematic error on estimated (signal) background

  • (s)bg_error_type – Systematic uncertainty perturbation type (flat, normal, log-normal)

Returns:

p-value

References

For the “CLs” approach motivation, see: A. Read, “Presentation of search results: the CLs technique” https://en.wikipedia.org/wiki/CLs_method_(particle_physics)

CL_single_asymptotic(mu, s_hypothesis, bg_expected, observed)[source]

One tailed signal upper limits using asymptotic (Gaussian) approximation.

A single channel counting experiment case (without systematics –> requires profile likelihood)

Parameters:
  • mu – Signal strength parameter

  • s_hypothesis – Number of predicted signal events (with mu = 1)

  • bg_expected – Number of expected background events

  • observed – Number of observed events

Returns:

p-value

References

Cowan et al, “Asymptotic formulae for likelihood-based tests of new physics”, https://arxiv.org/abs/1007.1727

https://indico.cern.ch/event/126652/contributions/1343592/

attachments/80222/115004/Frequentist_Limit_Recommendation.pdf

https://arxiv.org/abs/2102.04275

CL_single_compute(s_hypothesis, bg_expected, observed=None, s_syst_error=None, bg_syst_error=None, method='toys', alpha=0.05, num_toys_pdf=10000, num_toys_obs=1000, mu_bounds=(1e-05, 10000.0), mu_init=1.0, max_restart=1000, func_tol=1e-05, optimize_method='golden', maxiter=10000)[source]

Main solver function for the upper limit computation on the signal strength parameter mu.

Parameters:
  • s_hypothesis – expected signal count (with mu = 1)

  • bg_expected – expected background count

  • observed – observed event count, if None then observed = bg_expected

  • method – ‘toys’ or ‘asymptotic’

  • num_toys – number of MC samples for toys

  • alpha – confidence level for the upper limit (e.g. 0.05 ~ CL95 upper limits)

  • s_syst_error – systematic error size on signal

  • bg_syst_error – systematic error size on background

Optimization args:

mu_bounds: bounds on mu-value to obey mu_init: initial value max_restart: maximum number of restarts func_tol: quadratic loss tolerance optimize_method: method to use for the optimization, ‘brent’ or ‘golden’ maxiter: maximum number of iterations

Returns:

upper limits [-2 sigma, 1 sigma, median, +1 sigma, +2 sigma] at the level alpha

LL_splusb(n, s, b, EPS=1e-15)[source]

Single channel Poisson log-likelihood

Parameters:
  • n – counts

  • s – expected signal count (mean)

  • b – expected background count (mean)

asymptotic_uncertainty_CLs(mu_up, q, alpha, N)[source]

Asymptotic (Asimov) +-N sigma values around median expected N = 0 gives the median.

Type “CLs”, keeps the minus sigma tail on positive side of mu. See reference: “Frequentist_Limit_Recommendation.pdf”

Parameters:
  • mu_up – iteratively solved median expected upper limit value

  • q – test statistic value corresponding to mu_up

  • alpha – confidence level at which mu_up was extracted

  • N – array of sigmas e.g. [-2,-1,0,1,2]

convolve_systematics(n, syst_error, num_toys, error_type='log-normal-mean')[source]

Produce an array of distribution convoluted values

Parameters:
  • n – expected count value

  • syst_error – error size (e.g. standard deviation)

  • num_toys – number of MC samples

  • error_type – type of distribution

Returns:

array of counts

test_limits_unit_test(EPS=0.05)[source]

Simple unit tests

icefit.jacobian

get_full_hessian(loss, param)[source]

Compute the full Hessian matrix of a neural net (can be very slow)

https://www.cs.toronto.edu/~rgrosse/courses/csc2541_2021/readings/L02_Taylor_approximations.pdf

Parameters:
  • loss – neural net loss function

  • param – neural net parameters

get_gradient(inputs, func, **kwargs)[source]

Return derivative df(x)/dx or gradient vectors for f: R^n -> R

get_jacobian(inputs, func, **kwargs)[source]

Return Jacobian matrix of size m x n of a function f: R^n -> R^m

observed_fisher_info(logL, inputs, **kwargs)[source]

Observed Fisher information – the negative of the second derivative (Hessian matrix) of the log-likelihood function.

The observed information is typically evaluated at the maximum-likelihood estimate of theta.

‘Fisher information represents the curvature of the relative entropy’

Parameters:
  • logL – The log-likelihood function l(theta|X1,…X_n) = sum_{i=1}^n log f(X_i|theta), with X_i being the independent random observations

  • inputs – log-likelihood function input

References

https://en.wikipedia.org/wiki/Observed_information https://en.wikipedia.org/wiki/Optimal_design

Efron, B.; Hinkley, D.V. (1978). Assessing the accuracy of the maximum likelihood estimator: Observed versus expected Fisher Information”. Biometrika. 65 (3): 457–487.

test_dimension_interfaces()[source]
test_hessians(EPS=0.0001)[source]

Test Hessian matrix computations

test_jacobians(EPS=0.0001)[source]

icefit.lognormal

create_label(X)[source]
lognormal_param(m, v, match_mode='mean', verbose=True)[source]

Compute log-normal distribution mu and sigma parameters from the target mean (or median) m and variance v

mu and sigma follow the Wikipedia convention: https://en.wikipedia.org/wiki/Log-normal_distribution

Parameters:
  • m – mean (or median or mode) value of the target log-normal pdf

  • v – variance value of the target log-normal pdf

  • match_mode – input m as ‘mean’, ‘median’, or ‘mode’

main()[source]
rand_lognormal(m, v, N, match_mode='mean')[source]

Generate random numbers from log-normal density with the distribution target mean (or median) m and variance v

Parameters:
  • m – mean (or median or mode) value of the target log-normal pdf

  • v – variance value of the target log-normal pdf

  • N – number of samples

  • match_mode – input m as ‘mean’, ‘median’, or ‘mode’

rand_powexp(m, std, N)[source]

Power+exp type parametrization for median matching

Parameters:
  • m – median target

  • std – standard deviation target

  • N – number of samples

test_accuracy()[source]

Test accuracy made in the power-exp approximation with median = 1

test_lognormal()[source]

icefit.mantissa

bits2digits(bits: int) float[source]
digits2bits(digits: float) float[source]
main()[source]
plots(x, xhat, b, error_bins=300, dquant_label='none', mantissa_title='None', label='default', savepath='output')[source]
reduce_mantissa_bits(x: ndarray, manbits: int = 10)[source]
Parameters:
  • x – floating point numbers in an array

  • manbits – number of leading bits for the mantissa

reduce_mantissa_digits(x: ndarray, mandigits: int = 3)[source]
Parameters:
  • x – floating point numbers in an array

  • mandigits – number of digits precision in the mantissa

round_sig(x: float, sig: int = 2)[source]

Round to significant figures

icefit.mine

class MINENet(input_size=2, mlp_dim=[128, 128], noise_std=0.025, activation='relu', dropout=0.01, batch_norm=False, EPS=1e-12, **kwargs)[source]

MINE network object

input_size: total (summed) dimension of two random variables, can be different mlp_dim: mlp inner dimensionality

forward(x)[source]

Define the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

apply_mine(X: Tensor, Z: Tensor, model: Module, losstype: str, weights: Tensor | None = None)[source]

Compute Mutual Information by re-applying the trained model

(trained using statistics as input ~ X and Z)

apply_mine_batched(X: Tensor, Z: Tensor, model: Module, losstype: str, weights: Tensor | None = None, batch_size=4096, EPS=1e-12)[source]

Compute Mutual Information by re-applying the trained model in batches

(trained using statistics as input ~ X and Z)

compute_mine(joint: Tensor, marginal: Tensor, w: Tensor, model: Module, ma_eT: float | None = None, alpha=0.01, EPS=1e-12, losstype='MINE_EMA')[source]

Compute Mutual Information estimate

Parameters:
  • joint – input data

  • marginal – input data

  • w – input data weights

  • net – network object

Note that bias corrected loss is used only for the gradient descent, MI value is calculated without it.

Notes

MI_lb is mutual information lower bound (“neural information measure”) sup E_{P_{XZ}} [T_theta] - log(E_{P_X otimes P_Z}[exp(T_theta)])

estimate(X: Tensor, Z: Tensor, weights=None, epochs=50, alpha=0.01, losstype='MINE_EMA', batch_size=256, lr=0.001, weight_decay=1e-05, clip_norm=1.0, mlp_dim=[128, 128], dropout=0.01, activation='relu', batch_norm=False, noise_std=0.025, window_size=0.2, return_full=False, device=None, return_model_only=False, **kwargs)[source]

Accurate Mutual Information Estimate via Neural Network

Info:

Input data X,Z can be random vectors (with different dimensions) or just scalar variables.

Parameters:
  • X – input data variable 1 (N x dim1)

  • Z – input data variable 2 (N x dim2)

  • weights – input data weights (N) (set None if no weights)

Params:

batch_size : optimization loop batch size num_iter : number of iterations lr : learning rate hidden_dim : network hidden dimension loss : estimator loss ‘MINE_EMA’ (default, unbiased), ‘MINE’ alpha : exponentially moving average parameter window _size: iterations (tail) window size for the final estimate

Returns:

MI estimate MI uncertainty model

sample_batch(X: Tensor, Z: Tensor, weights: Tensor | None = None, batch_size=None, device=device(type='cpu', index=0))[source]

Sample batches of data, either from the joint or marginals

Parameters:
  • X – input data (N x dim)

  • Z – input data (N x dim)

  • weights – input weights (N)

  • batch_size – batch size

Returns:

joint, marginal batches with size (N x [dim[X] + dim[Z]])

train_loop(X: Tensor, Z: Tensor, weights: Tensor, model: Module, opt, clip_norm, batch_size, epochs, alpha, losstype, device=device(type='cpu', index=0))[source]

Train the network estimator

Parameters:

estimate() (See)

Returns:

mutual information estimates per iteration

icefit.peakfit

fit_and_analyze(p)[source]

Main analysis steering

fit_task(f, inputparam, savepath, YEAR, GENTYPE)[source]

Main fitting task for ‘single’ histogram fits

fit_task_multi(f, inputparam, savepath, YEAR, GENTYPE)[source]

Main fitting task for ‘dual’ (‘Pass’ and ‘Fail’ simultaneous) histogram fits

get_rootfiles_jpsi(input_path='/', years=[2016, 2017, 2018], systematics=['Nominal'], test_mode=False)[source]

Return rootfile names for the J/psi (muon) study.

group_systematics(p)[source]

Group and print results of systematic fit variations

run_jpsi_peakfit(inputparam, savepath)[source]

J/psi peak fitting with multiprocessing over kinematic ‘hyperbins’ via Ray

run_jpsi_tagprobe(inputparam, savepath)[source]

Tag & Probe efficiency (& scale factors)

icefit.score

logL(mu, sigma)[source]

Log likelihood function logL(mu,sigma | X)

icefit.statstools

ADS(s, b, sigma_b, EPS=1e-06)[source]

Asimov discovery significance with background uncertainty https://indico.cern.ch/event/708041/papers/3272129/files/9437-acat19DirectOpti.pdf

when sigma_b –> 0, this gives np.sqrt(2*((s+b)*np.log(1.0 + s/b) - s))

Parameters:
  • s – expected signal count

  • b – expected background count

  • sigma_b – error on the background estimate

  • EPS – regularization constant

analytical_extreme_npdf(N, mu=0, sigma=1)[source]

Analytical expectation (mean) value approximation (based on an expansion) for a normal pdf max (extreme) values with a sample size of N.

References

https://en.wikipedia.org/wiki/Extreme_value_theory https://en.wikipedia.org/wiki/Gumbel_distribution

clopper_pearson_err(k, n, CL=[0.025, 0.975])[source]

Clopper-Pearson binomial proportion confidence interval. Below, beta.ppf (percent point functions) returns inverse CDF for quantiles.

Parameters:
  • k – observed success counts

  • n – number of trials

  • CL – confidence levels [lower, upper]

Returns:

confidence interval [lower, upper]

columnar_mask_efficiency(num_mask: ndarray, den_mask: ndarray, y_true: ndarray, y_target: int, weights: ndarray)[source]

Masked columnar selection efficiency with event weights and classes

Parameters:
  • num_mask – numerator mask vector (Nx1, bool)

  • den_mask – denominator mask (Nx1, bool)

  • y_true – label vector (Nx1, int)

  • y_target – target label (int)

  • weights – event weights (Nx1, float)

Returns:

binomial efficiency, weighted binomial stat. error

cormat2covmat(R, s)[source]

Convert a correlation matrix to a covariance matrix.

Parameters:
  • R – correlation matrix (NxN)

  • s – vector of standard deviations (N)

Returns:

covariance matrix

correlation_matrix(X, weights=None, return_cov=False)[source]

Compute a (sample weighted) correlation or covariance matrix

Parameters:
  • X – Data matrix (rows as sample points, columns as variables)

  • weights – Weights per sample

  • return_cov – Return covariance matrix (not correlation)

Returns:

correlation matrix

covmat2corrmat(C, EPS=1e-12)[source]

Covariance matrix to correlation matrix

Parameters:

C – Covariance matrix

Returns:

Correlation matrix

error_on_mu(sigma, n)[source]

Standard error estimate on a sample mean. https://web.eecs.umich.edu/~fessler/papers/files/tr/stderr.pdf

Parameters:

sigma – sample standard deviation n : sample size

error_on_std(sigma, n)[source]

Standard error estimate on a sample standard deviation. https://web.eecs.umich.edu/~fessler/papers/files/tr/stderr.pdf :param sigma: sample standard deviation :param n: sample size

geom_mean_2D(x, y, x_err, y_err, flip_vertical=False)[source]

Geometric 2D mean of x and y (understood e.g. as 1D histograms) element wise and its error propagation based uncertainty (x,y taken also independent elem. wise)

Math note: outer(u,v)_ij = u_i v_j, with outer(u,v) = uv^T

Parameters:
  • x – values (numpy arrays)

  • y – values (numpy arrays)

  • x_err – uncertainties (numpy arrays)

  • y_err – uncertainties (numpy arrays)

  • flip_vertical – flip the values over the vertical (y) dimension

Returns:

2D-matrix (with rank 1) and its uncertainty

Return type:

z, z_err

mc_extreme_multivariate_npdf(x, mu, cov, trials=1000000)[source]

Extreme value distribution for a multivariate normal pdf via Monte Carlo, e.g. for “correlated measurements or bins” – casted as a univariate problem such that maximum of components is taken (not a “vector maximum”)

Parameters:
  • x – scalar value to evaluate the extreme value pdf upper tail integral

  • mu – mean vector (N)

  • cov – covariance matrix (NxN)

  • trials – number of Monte Carlo samples, as large as possible CPU wise

Returns:

probability for the value x or more extreme maxvals : full MC distribution of extreme values obtained

Return type:

p_value

mc_extreme_npdf(x, N, mu=0, sigma=1, trials=1000000)[source]

Extreme value distribution for normal pdf via Monte Carlo

Parameters:
  • x – value to evaluate the extreme value pdf upper tail integral

  • mu – mean of normal pdf

  • sigma – std of normal pdf

  • N – sample size

  • trials – number of Monte Carlo samples, as large as possible CPU wise

Returns:

probability for the value x or more extreme maxvals : full MC distribution of extreme values obtained

Return type:

p_value

p2zscore(p)[source]

p-value to z-score (standard normal sigmas)

Example

p-value is divided by two such that p = 0.05 gives z_{1-alpha/2} = z_{1-0.025} = 1.96

poisson_ratio(k1, k2, CL=array([0.025, 0.975]))[source]

Poisson ratio uncertainty via conditional ratio theta = lambda1 / (lambda1 + lambda2)

See e.g. “Confidence limits on the ratio of two Poisson variables, 1974”

E[k1] = lambda1 (Poisson mean) E[k2] = lambda2 (Poisson mean)

Parameters:
  • k1 – observations k1

  • k2 – observations k2

  • CL – 2-array of lower and upper quantiles

Returns:

ratio confidence interval endpoints

poisson_tail(k1, k2)[source]

Single Poisson CDF tail integral. Counts 2 works as the reference. :param k1: poisson counts 1 :param k2: poisson counts 2 (reference sample)

Returns:

p value

prod_eprop(A, B, sigmaA, sigmaB, sigmaAB=0)[source]

Error propagation (Taylor expansion) of a product A*B

Parameters:
  • A – Value A

  • B – Value B

  • sigmaA – 1 sigma uncertainty on A

  • sigmaB – 1 sigma uncertainty on B

  • sigmaAB – Covariance of A,B

Returns:

Uncertainty

ratio_eprop(A, B, sigmaA, sigmaB, sigmaAB=0)[source]

Error propagation (Taylor expansion) of a ratio A/B

Parameters:
  • A – Value A

  • B – Value B

  • sigmaA – 1 sigma uncertainty on A

  • sigmaB – 1 sigma uncertainty on B

  • sigmaAB – Covariance of A,B

Returns:

Uncertainty

test_efficiency_ratio(EPS=1e-06)[source]
test_extreme_npdf()[source]
test_ratios()[source]

Test function

tpratio_taylor(x, y, x_err, y_err, xy_err=0.0)[source]

Error propagation on a ratio

r = x/(x+y)

If x and y are e.g. counts from stat. independent histograms, then xy_err = 0 (~ tag & probe style ratios).

Parameters:
  • x – values

  • y – values

  • x_err – uncertainties

  • y_err – uncertainties

  • xy_err – covariance (err_x * err_y * rho_xy)

Returns:

Error propagated ratio error

weighted_binomial_err(b1: float, b2: float, e1: float, e2: float)[source]

Binomial counting efficiency error with weights

For the reference, see: https://root.cern/doc/master/TH1_8cxx_source.html#l02962

If e1 ~ sqrt(b1) and e2 ~ sqrt(b2), one should recover results of this function by using the proper covariance with:

prodratio_eprop(A=b1, B=b2, sigma_A=e1, sigma_B=e2, sigmaAB=b1, mode=’ratio)

N.B. If b1 == b2, will return 0

Parameters:
  • b1 – binomial numerator (e.g. count or sum of weights)

  • b2 – binomial denominator

  • e1 – error on the numerator (e.g. sqrt(sum_i(w_i^2))

  • e2 – error on the denominator

welch_ttest(X1, X2, s1, s2, n1, n2)[source]

Welch’s two sample t-test for normally distributed variables. https://en.wikipedia.org/wiki/Welch%27s_t-test

Parameters:
  • X1 – Mean of sample 1

  • X2 – Mean of sample 2

  • s1 – Standard deviation of sample 1

  • s2 – Standard deviation of sample 2

  • n1 – Counts in sample 1

  • n2 – Counts in sample 2

Returns:

test statistic p : p-value from t-distribution

Return type:

t