Analyzing Benchmark Results for Validation¶

So you ran your models against several criticality benchmarks. Nice! How do we analyze the results easily? NucML contains some utilities to help you get started.

[1]:

# Prototype
import sys
# This allows us to import the nucml utilities
sys.path.append("..")

[2]:

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
import os

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 50)
pd.options.mode.chained_assignment = None  # default='warn'
sns.set_style("white")

import nucml.ace.data_utilities as ace_utils
import nucml.model.utilities as model_utils
import nucml.ace.plot as ace_plots

[3]:

figure_dir = "figures/B0/"

[4]:

sns.set(font_scale=2.5)
sns.set_style('white')

Gathering Results from Serpent Runs¶

You can automatically read all benchmark .mat files and format the results easily by simply specifying the directory where the benchmark model information is stored (see previous notebook).

[5]:

model_results_b0 = ace_utils.gather_benchmark_results("ml/DT_B0/")
model_results_b1 = ace_utils.gather_benchmark_results("ml/DT_B1/")
model_results_b2 = ace_utils.gather_benchmark_results("ml/DT_B2/")
model_results_b3 = ace_utils.gather_benchmark_results("ml/DT_B3/")
model_results_b4 = ace_utils.gather_benchmark_results("ml/DT_B4/")

[6]:

model_results_b0.head()

[6]:

	Model	Benchmark	K_eff_ana	Unc_ana	K_eff_imp	Unc_imp	Deviation_Ana	Deviation_Imp
0	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_001	0.989927	0.00044	0.990024	0.00030	0.010073	0.009976
1	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_002_001	0.992332	0.00042	0.992233	0.00029	0.007668	0.007767
2	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_002_002	0.996557	0.00044	0.996643	0.00031	0.003443	0.003357
3	DT100_MSS10_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.993064	0.00044	0.992735	0.00030	0.006936	0.007265
4	DT100_MSS10_MSL3_none_one_hot_B0_v1	U233_MET_FAST_002_001	0.994547	0.00043	0.994306	0.00029	0.005453	0.005694

Analyzing Decision Tree Results¶

That was easy. However, we do not have the training and validation metrics that we had before. We can simply read the results files and join them. First, let us load the results and keep the most basic columns including hyperparameters and performance metrics.

[11]:

results_b0 = pd.read_csv("../ML_EXFOR_neutrons/2_DT/dt_resultsB0.csv").sort_values(by="max_depth")
results_b1 = pd.read_csv("../ML_EXFOR_neutrons/2_DT/dt_resultsB1.csv").sort_values(by="max_depth")
results_b2 = pd.read_csv("../ML_EXFOR_neutrons/2_DT/dt_resultsB2.csv").sort_values(by="max_depth")
results_b3 = pd.read_csv("../ML_EXFOR_neutrons/2_DT/dt_resultsB3.csv").sort_values(by="max_depth")
results_b4 = pd.read_csv("../ML_EXFOR_neutrons/2_DT/dt_resultsB4.csv").sort_values(by="max_depth")

results_b0 = results_b0[results_b0.normalizer == "none"]

[12]:

# IGNORE THIS CELL
# results_b0['Model'] = results_b0.model_path.apply(lambda x: os.path.basename(os.path.dirname(x)))
# results_b0['dataset'] = 'b0'
# results_b1['Model'] = results_b1.model_path.apply(lambda x: os.path.basename(os.path.dirname(x)))
# results_b1['dataset'] = 'b1'

This step is optional, but in this example, we will join the results from different models trained on different datasets. Some models might be named the same but were trained on different dataset versions. Here we add a unique identifier previous to merging the results.

[13]:

for df, dataset_tag in zip([results_b0, results_b1, results_b2, results_b3, results_b4], ["b0", "b1", "b2", "b3", "b4"]):
    df['Model'] = df.model_path.apply(lambda x: os.path.basename(os.path.dirname(x)))
    df['dataset'] = dataset_tag

Filtering to obtain most basic features:

[14]:

results_b0 = results_b0[["Model", "train_mae", "val_mae", "test_mae", "max_depth", "mss", "msl", "dataset"]]
results_b1 = results_b1[["Model", "train_mae", "val_mae", "test_mae", "max_depth", "mss", "msl", "dataset"]]
results_b2 = results_b2[["Model", "train_mae", "val_mae", "test_mae", "max_depth", "mss", "msl", "dataset"]]
results_b3 = results_b3[["Model", "train_mae", "val_mae", "test_mae", "max_depth", "mss", "msl", "dataset"]]
results_b4 = results_b4[["Model", "train_mae", "val_mae", "test_mae", "max_depth", "mss", "msl", "dataset"]]

Finally, we can merge the results with the gathered benchmark information.

[15]:

final_b0 = model_results_b0.merge(results_b0, on="Model")
final_b1 = model_results_b1.merge(results_b1, on="Model")
final_b2 = model_results_b2.merge(results_b2, on="Model")
final_b3 = model_results_b3.merge(results_b3, on="Model")
final_b4 = model_results_b4.merge(results_b4, on="Model")

[16]:

final_set = final_b0.append(final_b1).append(final_b2).append(final_b3).append(final_b4)

[17]:

final_set.head()

[17]:

	Model	Benchmark	K_eff_ana	Unc_ana	K_eff_imp	Unc_imp	Deviation_Ana	Deviation_Imp	train_mae	val_mae	test_mae	max_depth	mss	msl	dataset
0	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_001	0.989927	0.00044	0.990024	0.00030	0.010073	0.009976	0.070281	0.125724	0.124429	100	10	1	b0
1	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_002_001	0.992332	0.00042	0.992233	0.00029	0.007668	0.007767	0.070281	0.125724	0.124429	100	10	1	b0
2	DT100_MSS10_MSL1_none_one_hot_B0_v1	U233_MET_FAST_002_002	0.996557	0.00044	0.996643	0.00031	0.003443	0.003357	0.070281	0.125724	0.124429	100	10	1	b0
3	DT100_MSS10_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.993064	0.00044	0.992735	0.00030	0.006936	0.007265	0.082464	0.122372	0.121066	100	10	3	b0
4	DT100_MSS10_MSL3_none_one_hot_B0_v1	U233_MET_FAST_002_001	0.994547	0.00043	0.994306	0.00029	0.005453	0.005694	0.082464	0.122372	0.121066	100	10	3	b0

Nice. You can then proceed to analyze your results and explor hyperparametres as a function of the multiplication factor and the error. Check the thesis for more information.

You can, for example, create a DataFrame for each benchmark and analyze what models have good performance in terms of the multiplication factor.

[19]:

u233_002_001 = final_set[final_set.Benchmark == "U233_MET_FAST_002_001"].sort_values(by="Deviation_Ana")
u233_002_002 = final_set[final_set.Benchmark == "U233_MET_FAST_002_002"].sort_values(by="Deviation_Ana")
u233_001 = final_set[final_set.Benchmark == "U233_MET_FAST_001"].sort_values(by="Deviation_Ana")

[20]:

# converting error to 100% scale
u233_001.Deviation_Ana = u233_001.Deviation_Ana * 100
u233_002_001.Deviation_Ana = u233_002_001.Deviation_Ana * 100
u233_002_002.Deviation_Ana = u233_002_002.Deviation_Ana * 100

[21]:

final_b0[final_b0.Benchmark == "U233_MET_FAST_001"].sort_values(by="Deviation_Ana").head()

[21]:

	Model	Benchmark	K_eff_ana	Unc_ana	K_eff_imp	Unc_imp	Deviation_Ana	Deviation_Imp	train_mae	val_mae	test_mae	max_depth	mss	msl	dataset
138	DT136_MSS5_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.999906	0.00043	0.999979	0.00029	0.000094	0.000021	0.077729	0.123489	0.122285	136	5	3	b0
90	DT120_MSS5_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.999637	0.00044	0.999427	0.00029	0.000363	0.000573	0.077731	0.123492	0.122258	120	5	3	b0
30	DT100_MSS5_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.999544	0.00041	0.999753	0.00028	0.000456	0.000247	0.077741	0.123487	0.122248	100	5	3	b0
543	DT70_MSS5_MSL3_none_one_hot_B0_v1	U233_MET_FAST_001	0.999485	0.00042	0.999519	0.00028	0.000515	0.000481	0.077826	0.123491	0.122261	70	5	3	b0
171	DT160_MSS2_MSL1_none_one_hot_B0_v1	U233_MET_FAST_001	1.000560	0.00043	1.000950	0.00030	0.000560	0.000950	0.026063	0.136077	0.134881	160	2	1	b0

It seems like the DT136_MSS5_MSL3_none_one_hot_B0_v1 model has great performance in the benchmark with an error of 0.000094%.

Getting Best Models Overall¶

Similar to traditional ML validation techniques, it is only correct that we analyze the result of the ML algorithms on their average performance on a set of criticality benchmark cases, rather than just one. The examples here contain information on three benchmarks. Those will suffice for a proof-of-concept analysis.

An option is simply grouping the model by performance. Beware, this can lead to some major misconceptions and it is shown here only as an example.

[32]:

model_mean = final_set.groupby("Model").mean()
# model_mean = model_mean[['K_eff_ana']]
model_mean["Error"] = (abs(model_mean.K_eff_ana - 1) /1) * 100
model_mean["Unc_Error"] = (abs(model_mean.Unc_ana - 1) /1) * 100
model_mean = model_mean.reset_index()

[34]:

model_mean.sort_values("Error").head()

[34]:

	Model	K_eff_ana	Unc_ana	K_eff_imp	Unc_imp	Deviation_Ana	Deviation_Imp	train_mae	val_mae	test_mae	max_depth	mss	msl	Error	Unc_Error
27	DT110_MSS5_MSL1_none_one_hot_B0_v1	0.999999	0.000430	1.000269	0.000297	0.002014	0.001611	0.052218	0.131089	0.129714	110	5	1	0.000100	99.957000
374	DT90_MSS5_MSL1_none_one_hot_B1_v1	1.000030	0.000440	1.000064	0.000297	0.001430	0.001469	0.052356	0.130018	0.129852	90	5	1	0.003000	99.956000
209	DT310_MSS5_MSL1_none_one_hot_B1_v1	1.000034	0.000423	1.000101	0.000290	0.001786	0.001619	0.051868	0.130218	0.130013	310	5	1	0.003400	99.957667
302	DT60_MSS5_MSL1_none_one_hot_B0_v1	1.000039	0.000437	1.000160	0.000290	0.001401	0.001600	0.053003	0.131055	0.129657	60	5	1	0.003867	99.956333
105	DT180_MSS5_MSL1_none_one_hot_B1_v1	0.999951	0.000423	1.000313	0.000297	0.001769	0.001600	0.051949	0.130154	0.129956	180	5	1	0.004900	99.957667

———————————————— PRIVATE SECTION¶

[16]:

print(model_utils.get_best_models_df(u233_001[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']]).to_latex(index=False))

\begin{tabular}{lrrrrrrl}
\toprule
                             Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae &   tag \\
\midrule
DT400\_MSS2\_MSL1\_none\_one\_hot\_B0\_v1 &   1.002320 &  0.00043 &         0.2320 &   0.025773 & 0.136140 &  0.135027 & Train \\
DT70\_MSS10\_MSL7\_none\_one\_hot\_B1\_v1 &   0.997118 &  0.00044 &         0.2882 &   0.094443 & 0.118699 &  0.119142 &   Val \\
DT90\_MSS10\_MSL7\_none\_one\_hot\_B0\_v1 &   0.922530 &  0.00046 &         7.7470 &   0.094439 & 0.119797 &  0.118706 &  Test \\
\bottomrule
\end{tabular}

[17]:

print(u233_001[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']].head(1).to_latex(index=False))

\begin{tabular}{lrrrrrr}
\toprule
                             Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae \\
\midrule
DT80\_MSS15\_MSL3\_none\_one\_hot\_B1\_v1 &   0.999943 &  0.00041 &         0.0057 &   0.088061 & 0.120462 &  0.120684 \\
\bottomrule
\end{tabular}

[20]:

print(model_utils.get_best_models_df(u233_002_001[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']]).to_latex(index=False))

\begin{tabular}{lrrrrrrl}
\toprule
                             Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae &   tag \\
\midrule
DT400\_MSS2\_MSL1\_none\_one\_hot\_B0\_v1 &   1.003330 &  0.00044 &         0.3330 &   0.025773 & 0.136140 &  0.135027 & Train \\
DT70\_MSS10\_MSL7\_none\_one\_hot\_B1\_v1 &   0.997767 &  0.00044 &         0.2233 &   0.094443 & 0.118699 &  0.119142 &   Val \\
DT90\_MSS10\_MSL7\_none\_one\_hot\_B0\_v1 &   0.929108 &  0.00045 &         7.0892 &   0.094439 & 0.119797 &  0.118706 &  Test \\
\bottomrule
\end{tabular}

[21]:

print(u233_002_001[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']].head(1).to_latex(index=False))

\begin{tabular}{lrrrrrr}
\toprule
                             Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae \\
\midrule
DT280\_MSS5\_MSL1\_none\_one\_hot\_B0\_v1 &        1.0 &  0.00041 &            0.0 &    0.05187 & 0.131216 &  0.129827 \\
\bottomrule
\end{tabular}

[23]:

print(model_utils.get_best_models_df(u233_002_002[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']]).to_latex(index=False))

\begin{tabular}{lrrrrrrl}
\toprule
                             Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae &   tag \\
\midrule
DT400\_MSS2\_MSL1\_none\_one\_hot\_B0\_v1 &   1.005650 &  0.00042 &         0.5650 &   0.025773 & 0.136140 &  0.135027 & Train \\
DT70\_MSS10\_MSL7\_none\_one\_hot\_B1\_v1 &   1.000680 &  0.00044 &         0.0680 &   0.094443 & 0.118699 &  0.119142 &   Val \\
DT90\_MSS10\_MSL7\_none\_one\_hot\_B0\_v1 &   0.936182 &  0.00046 &         6.3818 &   0.094439 & 0.119797 &  0.118706 &  Test \\
\bottomrule
\end{tabular}

[41]:

print(u233_002_002[["Model", 'K_eff_ana', 'Unc_ana', 'Deviation_Ana', 'train_mae', 'val_mae', 'test_mae']].head(1).to_latex(index=False))

\begin{tabular}{lrrrrrr}
\toprule
                              Model &  K\_eff\_ana &  Unc\_ana &  Deviation\_Ana &  train\_mae &  val\_mae &  test\_mae \\
\midrule
DT170\_MSS10\_MSL3\_none\_one\_hot\_B0\_v1 &    1.00022 &  0.00042 &          0.022 &   0.082419 & 0.121982 &  0.121942 \\
\bottomrule
\end{tabular}