Training¶
Train Voxel-Dynamics graph constructor¶
The voxelized space(-time) Voronoi division & connectivity C-matrix
estimator takes
into account the dynamics of particle tracks (equations of motion) as described by the simulations (training data),
where do they appear in space-time and also the detector geometry adaptively.
For example:
python src/trackml_pretrain_voxdyn.py --node2node hyper --ncell 65536 131072 --device cpu
Once trained, the estimator’s true (false) edge efficiency is pile-up invariant (!) by construction, but the purity is not.
The Voronoi division quality saturates quickly with the number of training sample size (number of tracks ~ hits), but the connectivity C-matrix is estimated with much higher accuracy with larger samples.
Train neural GNN + Transformer model¶
The training first proceeds only with the GNN edge predictor part of the network and once
the AUC is over --auc_threshold
, then the neural clustering transformer training is activated end-to-end.
The training for the track reconstruction problem is executed with:
python src/trackml_train.py \
--param tune-5 \
--cluster "transformer" \
--soft_reset 0 \
--learning_rate 5e-4 \
--scheduler_type "warm-cos" \
--optimizer "AdamW" \
--epoch -1 \
--save_tag f-0p1-hyper-5 \
--rfactor_start 0.1 --rfactor_end 0.1 --noise_ratio 0.05 \
--node2node hyper --ncell 262144 \
--validate 0 \
--batch_size 1 \
--fp_dtype "float32"
See more examples under /tests/train*
for different pile-up scenarios.
Training will take days (weeks) on a single Nvidia V100 GPU, and VRAM memory requirements grow at
least linearly as a function of the track density if --ncell
is scaled up accordingly.
A GPU with 32+ GB VRAM is recommended (a must), to be able to increase the pile-up scenario
with --rfactor_start
and --rfactor_end
, use deeper models and possibly
higher --batch_size
parameter in the neural training.
Training parameters¶
The training steering program parameters, such as the starting learning rate, can be seen with the command --help
and the rest are descibed under /hypertrack/models/global_<TAG>.py
.
Model hyperparameters¶
The model hyperparameters are encoded in the file /hypertrack/models/global_<TAG>.py
.
One can create several different tunes or custom models, by choosing a --save_tag <TAG>
name and
by copying existing models files into a new pair:
/hypertrack/models/global_<TAG>.py
/hypertrack/models/models_<TAG>.py
before starting the training.
Ground truth graph topology¶
The ground truth adjacency of graph nodes (hits) is controlled with the command line parameter --node2node
.
This needs to be consistent between trackml_pretrain_voxdyn
, trackml_train
and trackml_inference
,
but one can do mixed diagnostics.
`hyper` for a fully connected hyperedge like 'lasso' over all nodes
`eom` for a minimal spanning tree (equations of motions helix trajectory)
`cricket` for double hops included eom trajectory
This impacts directly the Voxel-Dynamics estimator C-matrix
construction and indirectly
the neural estimator training, especially the edge predictor part, because it defines its
label target goal. The eom
will favor to train an edge predictor which
is space-time local and the hyper
mode a fully space-time non-local one.
In general, only hyper
mode is fully compatible with the overall neural
clustering goal, but one can use eom
mode, e.g., to train only a highly performing
GNN edge predictor. Set also --cluster none
, which inactivates the clustering transformer from training.
The ground truth definition impacts also edge ROC/AUC (efficiency) between A_hat
(estimate) and A
(ground truth) adjacency.
However, the final track clustering metrics such as efficiency
or purity
definition are topology independent,
but naturally the chosen definition impacts the underlying model construction.
Progressive transfer learning¶
To reduce overall GPU time and improve convergence, it is beneficial to train the model using a low pile-up
scenario and once converged, use that model as a starting point for the higher pile-up scenario.
This can be done progressively in multiple steps, e.g. by setting --rfactor_start
and
--rfactor_end
first to 0.01
, then both 0.1
and finally both to 0.2
(assuming you have enough VRAM).
Similarly, one should increase the voxel space dimension ncell
as a function of the mean pile-up.
Starting the training using an existing model weights can be done simply by
using --transfer_tag
and --save_tag
command line parameters when starting the new training.
Remember to remove --transfer_tag
afterwards.
Learning rate¶
Start with
--learning_rate 5e-4
The learning rate can be crucial, e.g. with 1e-5 the GNN does not really enter learning phase.
Then in the final stage, perhaps decrease to
--learning_rate 1e-4 (e.g. Transformer more stable)
In general, larger models with more GNN and transformer layers and higher latent dimensions may require smaller learning rates.