Experimental Nuclear Reaction Data (EXFOR)

In this notebook we explore the EXFOR dataset and showcase some of the plotting and data manipulation capabalities of NucML. Let us start by importing some of the needed modules.

[1]:
# # Prototype
# import sys
# sys.path.append("../../")
[2]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import os

pd.set_option('display.max_columns', 500)
pd.set_option('display.max_rows', 50)
pd.options.mode.chained_assignment = None  # default='warn'

import nucml.datasets as nuc_data
[3]:
sns.set(font_scale=2)
sns.set_style("white")

# Setting up the path where our figures will be stored
figure_dir = "./Figures/"

NOTE: WHEN WE TALK ABOUT CROSS SECTIONS WE ARE REFERING TO THE FEATURE NAMED “DATA”, WHICH IS OUR LABEL

Loading the EXFOR Database

Let us study the dataset feature by feature. First we load our dataset. Our focus is on low energy physics so we shall limit our dataset to these energies first (< 20 MeV).

[4]:
df = nuc_data.load_exfor()
df.MT = df.MT.astype(int)
INFO:root: MODE: neutrons
INFO:root: LOW ENERGY: False
INFO:root: LOG: False
INFO:root: BASIC: -1
INFO:root:Reading data from C:/Users/Pedro/Desktop/ML_Nuclear_Data/EXFOR/CSV_Files\EXFOR_neutrons\EXFOR_neutrons_MF3_AME_no_RawNaN.csv
INFO:root:Data read into dataframe with shape: (4255409, 104)
INFO:root:Finished. Resulting dataset has shape (4255409, 104)
[5]:
# df_original[df_original.Data == 0].shape
[6]:
df = df[df.Energy <= 2E7]

Understanding the Challenge

Before we start analyzing the features of the EXFOR database, we must understand the challenge and the complexity of it.

The goal here is to be able to predict cross-section values for a variety of reaction channels and isotopes. These cross-section values are probabilities although not in the usual 0-1 scale. Throughout the world, experimental measurements are performed to gather data that can then be used in traditional physics-based modeling to create a set of recommended values. Let us look at the Uranium-235(n,gamma) reaction.

The nucml.exfor.data_utilities module contains the main functions to deal with EXFOR. Let us start by plotting the fission reaction for Uranium 235.

[7]:
import nucml.exfor.data_utilities as exfor_utils
[8]:
kwargs = {"nat_iso":"I", "one_hot":False, "alpha":0.9, "legend":False, "interpolate":False, "log_plot":True, "ref":True}

u_exfor_endf = exfor_utils.plot_exfor_w_references(
    df, 92, 235, 18, get_endf=False, error=True, save=True, path=figure_dir, **kwargs)
INFO:root:Extracting samples from dataframe.
INFO:root:EXFOR extracted DataFrame has shape: (133028, 104)
../_images/notebooks_3_EDA_EXFOR_10_1.png

Here, each color represents a different experimental campaign. Researchers use different methods for their measurements that also carry different data processing methodologies. There is not standardize measurement technique, therefore, introducing uncertainties. In the following sections, we try to analyze the features that may affect the cross section behavior. The EXFOR database contains experimental measurements not only for one isotope like uranium but for 96 more isotopes and 36 total reactions. That is a total of 3492 combinations of isotope, reaction types.

Energy Distribution

We know that the reaction probability is a function of energy, isotope, and reaction type. All of these are important and fundamental. Since Energy is perhaps one of the most important features so we shall start by analyzing this feature.

[10]:
df.Energy.describe() # vanilla statistics
[10]:
count    4.197495e+06
mean     1.074256e+06
std      2.620195e+06
min      0.000000e+00
25%      4.326000e+02
50%      2.484900e+04
75%      7.706700e+05
max      2.000000e+07
Name: Energy, dtype: float64
[11]:
df = df[df.Energy != 0] # Energy cannot be 0 physically so we drop these
[12]:
from scipy.stats import skew
skew(df.Energy.values)
[12]:
3.851943394525566
[13]:
sns.displot(df[["Energy"]], x="Energy", height=6, aspect=3, bins=50)
plt.title("Energy Distribution")
[13]:
Text(0.5, 1.0, 'Energy Distribution')
../_images/notebooks_3_EDA_EXFOR_15_1.png

There are various observations by just looking at some typical statistics:

  1. Energy cannot be 0 so we have some incorrect datapoints. We dropped these.

  2. The standard deviation is high but expected. This dataset includes reaction data for all isotopes and all reaction channels.

  3. The energy is highly skewed (this will affects model performance).

There are many ways to un-skew the feature including log, square root, and power transformers. We know that our data is positive and cannot be zero so any of these works. Let us observe the impact of the log and Box Cox transformer on the skewness. The square root is not very powerful and will not be tested here (it has been tested before and the skewness change is minimal).

Log10 Transformation

[19]:
skew(np.log10(df[["Energy"]]).values)
[19]:
array([-0.62603645])
[178]:
sns.displot(np.log10(df[["Energy"]]), x="Energy", height=6, aspect=3, bins=50)
plt.title("Energy Distribution after Log10")
[178]:
Text(0.5, 1.0, 'Energy Distribution after Log10')
../_images/notebooks_3_EDA_EXFOR_18_1.png

Box Cox Transform

[22]:
from scipy import stats

energy_box, _= stats.boxcox(df.Energy.values)
[28]:
skew(energy_box)
[28]:
-0.09672338436387388
[179]:
sns.displot(energy_box, height=6, aspect=3, bins=50)
plt.title("Energy Distribution after Box Cox")
[179]:
Text(0.5, 1.0, 'Energy Distribution after Box Cox')
../_images/notebooks_3_EDA_EXFOR_22_1.png
[111]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,8))
ax1.hist(df.Energy, bins=50)
ax2.hist(df[["Energy"]], bins = 10 ** np.linspace(np.log10(df.Energy.min()), np.log10(df.Energy.max()), 50))
ax2.set_xscale('log')
ax2.set_xlabel('Energy (eV)')
# ax2.set_ylabel('Frequency')
# ax1.get_yaxis().set_label_coords(-0.08,0.5)
ax1.set_ylabel('Frequency')
ax1.set_xlabel('Energy (eV)')
# ax2.get_yaxis().set_label_coords(-0.08,0.5)
plt.savefig(os.path.join(figure_dir, "energy_distribution_h.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_23_0.png

Either the log or Box Cox transformation followed by a standardizer or normalizer will benefit model performance by unskewing the data and minimizing the standard deviation. We shall keep this in mind when using any algorithm.

Cross Section (Data) Distribution - Label

The cross-section is our label making this a regression challenge. Similar to the Energy, we expect this feature to have high standard deviations and be highly skewed. However, we must be careful not to change the nature of the data. These values represent physical reaction probabilities (although not in the usual 0-1 scale).

[175]:
df.Data.describe()
[175]:
count    4.185090e+06
mean     6.480420e+03
std      6.187566e+05
min      8.250000e-18
25%      2.695000e+00
50%      5.820400e+00
75%      1.404500e+01
max      2.311600e+08
Name: Data, dtype: float64
[37]:
df = df[df.Data != 0]
[38]:
skew(df.Data.values)
[38]:
138.58342560709644
[176]:
sns.displot(df[["Data"]], x="Data", height=6, aspect=3, bins=50)
plt.title("Data (Cross Section) Distribution")
[176]:
Text(0.5, 1.0, 'Data (Cross Section) Distribution')
../_images/notebooks_3_EDA_EXFOR_28_1.png

There are various observations by just looking at some typical statistics:

  1. Data cannot be 0 so we have some incorrect datapoints. We dropped these.

  2. The standard deviation for the cross section is also high but expected.

  3. The cross section feature is highly skewed (this will affects model performance).

We refrain from applying any other transformation other than the log transformation to not change the nature of the data. To optimize model raining (although model dependent), the features must have similar ranges. This helps algorithms like KNN not give more importance to high variance features and helps deep learning models train faster. However, we shall not normalize our labels (cross-section data) since these are absolute probabilities.

There are many ways to unskew the feature including log, square root, and power transformers. We know that our data is positive and cannot be zero so any of these work. Let us observe the impact of each transformer on the skewness.

Log10 Transformation

[180]:
skew(np.log10(df[["Data"]]).values)
[180]:
array([-1.01712755])
[181]:
sns.displot(np.log10(df[["Data"]]), x="Data", height=6, aspect=3, bins=50)
plt.title("Data (Cross Section) Distribution after Log10")
[181]:
Text(0.5, 1.0, 'Data (Cross Section) Distribution after Log10')
../_images/notebooks_3_EDA_EXFOR_33_1.png
[112]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,8))
ax1.hist(df.Data, bins=50)
ax2.hist(df[["Data"]], bins = 10 ** np.linspace(np.log10(df.Energy.min()), np.log10(df.Energy.max()), 50))
ax2.set_xscale('log')
ax2.set_xlabel('Cross Section (b)')
# ax2.set_ylabel('Frequency')
# ax1.get_yaxis().set_label_coords(-0.08,0.5)
ax1.set_ylabel('Frequency')
ax1.set_xlabel('Cross Section (b)')
# ax2.get_yaxis().set_label_coords(-0.08,0.5)
plt.savefig(os.path.join(figure_dir, "xs_distribution_h.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_34_0.png

Outliers in Energy and Data (Cross Section)

It is difficult to detect outliers in this particular dataset. The following image shows the U-235(N,G) reaction channel. This is an example of a single reaction for a single isotope. Typical statistics techniques for outlier detection will categorize resonances (peaks) points as outliers, however, these are not. These are the most important quantum physics phenomena that we need to model and can therefore not be disregarded.

[60]:
u235_ng = exfor_utils.load_samples(df, 92, 235, 102)
INFO:root:Extracting samples from dataframe.
INFO:root:EXFOR extracted DataFrame has shape: (10872, 104)
[ ]:
fig = plt.figure(figsize=(8, 6))
gs = gridspec.GridSpec(1, 2, width_ratios=[3, 1])
ax0 = plt.subplot(gs[0])
ax0.plot(x, y)
ax1 = plt.subplot(gs[1])
ax1.plot(y, x)
[110]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,8), gridspec_kw={'width_ratios': [1, 10]})
sns.boxplot(y=u235_ng[["Data"]].values, ax=ax1)
ax1.set_yscale('log')
ax2.loglog(u235_ng.Energy, u235_ng.Data)

ax1.set_ylabel("Cross Section (b)")
ax2.set_xlabel("Energy (eV)")

plt.savefig(os.path.join(figure_dir, "u235_ng_xs_dist.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_38_0.png

As an example let us use the quantile information to remove some “outliers” for this particular reaction only.

[107]:
without_outliers = u235_ng[np.log10(u235_ng.Data) < np.log10(u235_ng.Data).quantile(.75)]
# without_outliers = without_outliers[np.log10(without_outliers.Data) > np.log10(without_outliers.Data).quantile(.05)]
[109]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,8), gridspec_kw={'width_ratios': [1, 10]})
sns.boxplot(y=without_outliers[["Data"]].values, ax=ax1)
ax1.set_yscale('log')
ax2.loglog(without_outliers.Energy, without_outliers.Data)

ax1.set_ylabel("Cross Section (b)")
ax2.set_xlabel("Energy (eV)")

plt.savefig(os.path.join(figure_dir, "u235_ng_xs_dist_without_outliers.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_41_0.png

These types of processes are not acceptable. We lose a lot of useful information. In another notebook, we shall use more advanced methods including robust covariance, one-class SVM, and isolation forests to try and clean the data better.

In a different note, another benefit of applying a log transformation is the easy visualization and the creation of correlation.

[236]:
u235_ng[["Energy", "Data"]].corr()
[236]:
Energy Data
Energy 1.000000 -0.019248
Data -0.019248 1.000000
[237]:
np.log10(u235_ng[["Energy", "Data"]]).corr()
[237]:
Energy Data
Energy 1.000000 -0.430042
Data -0.430042 1.000000

Reaction Channel (MT Values) - Categorical

Before modeling the EXFOR database, we need to understand the possible limitations of any model using this data. There are a variety of reaction channels but not all of them all well researched. Knowing the reaction class balances will allow us to have a certain degree of confidence when deploying a model.

[192]:
# This file contains the meaning of each MT code, useful for plotting
mt_codes = pd.read_csv("../CSV_Files/mt_codes.csv")
mt_codes["MT"] = mt_codes.astype(str)
mt_codes = mt_codes[["MT", "MT_Reaction_Notation"]]

MT_value_counts = df.MT.astype(str).value_counts()
MT_value_counts = pd.DataFrame(MT_value_counts).reset_index()
MT_value_counts.columns = ["MT", "Count"]

MT_value_counts = mt_codes.merge(MT_value_counts).sort_values(by="Count", ascending=False)
MT_value_counts["MT"] = MT_value_counts["MT"].astype(int)

What are the highest researched reaction channels?

It appears that (n, fission) is the most researched reaction. This makes sense since many resources have been spent trying to understand fission, especially since it is by this mechanism that nuclear reactors and weapons work. The next one is (N,G), a reaction important for studies in safety and neutron moderation.

[193]:
MT_value_counts[1:10]
[193]:
MT MT_Reaction_Notation Count
6 18 (n,fission) 507541
17 102 (n,g) 215904
34 9000 Production Data 84333
15 51 (n,n1) 49835
3 4 (n,inelastic) 37641
1 2 (n,elastic) 37296
18 103 (n,p) 17270
22 107 (n,a) 15295
4 16 (n,2n) 10930

What is the label (cross-section) distribution as a function of the reaction channel?

There is no need to look at the distribution of cross-section data in regards to the reaction channel. Usual statistics will not work here. The cross-section ranges are a function of the element/isotope, and the reaction channel. Statistics based on the entire dataset will result in erroneous conclusions. As an example, let us observe the Data distribution of the most researched channels.

[102]:
for i in MT_value_counts[1:5].MT.values:
    plt.figure(figsize=(17,1))
    sns.boxplot(x=np.log10(df[df.MT == int(i)].Data).values)
    plt.xlabel("MT={} Data (Log10)".format(i))
../_images/notebooks_3_EDA_EXFOR_50_0.png
../_images/notebooks_3_EDA_EXFOR_50_1.png
../_images/notebooks_3_EDA_EXFOR_50_2.png
../_images/notebooks_3_EDA_EXFOR_50_3.png

What are the lowest researched reaction channels?

Reactions with a high number of outgoing particles have much lower probabilities than those with only one or two resulting particles. It will be harder to model these types of reactions but representing this feature as numerical might help extrapolate knowledge from the other highly researched reactions. For know, let us confirm our expectations.

[104]:
MT_value_counts[-10:]
[104]:
MT MT_Reaction_Notation Count
25 112 (n,p+a) 20
33 2103 (n,p)DI 17
30 203 (n,Xp) 9
14 41 (n,2n+p) 7
23 108 (n,2a) 6
28 158 (n,n+a+d) 5
26 113 (n,t+2a) 2
32 1108 (n,2a)CALC 2
29 159 (n,2n+p+a) 1
13 37 (n,4n) 1

These reactions are expected to have similar ranges in terms of Cross Section so we can plot them.

[194]:
low_research = df[df['MT'].astype(str).isin(MT_value_counts[-10:].MT.values.astype(str))]
[198]:
plt.figure(figsize=(14,8))
sns.catplot(x="MT", y="Data", data=low_research, height=6, aspect=2, kind="box")
plt.ylabel("Cross Section (b)")
plt.xlabel("Reaction Channel (MT)")
[198]:
Text(0.5, 28.400000000000034, 'Reaction Channel (MT)')
<Figure size 1008x576 with 0 Axes>
../_images/notebooks_3_EDA_EXFOR_55_2.png

Proton and Neutron Distribution

Previously we mentioned that the most important features are the Energy, Reaction Channel, and the Isotope. What characterizes the isotope? The isotope is made of neutrons and protons. Understanding the distributions will help us understand in what regions our models are expected to perform well and for which isotopes the performance might be poor.

[21]:
plt.figure(figsize=(14,8))
sns.distplot(df.Z.values, label="Protons")
sns.distplot(df.N.values, label="Neutrons")
# plt.xlabel('Number of Neutrons')
plt.ylabel('Distribution')
plt.xlabel('Proton (Z) and Neutron (N) Number')
plt.legend()
plt.title("Proton and Neutron Distribution")
plt.savefig(os.path.join(figure_dir, "proton_neutron_distribution.png"), bbox_inches='tight', dpi=300)
C:\Users\Pedro\Anaconda3\envs\tf_nightly\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
C:\Users\Pedro\Anaconda3\envs\tf_nightly\lib\site-packages\seaborn\distributions.py:2557: FutureWarning: `distplot` is a deprecated function and will be removed in a future version. Please adapt your code to use either `displot` (a figure-level function with similar flexibility) or `histplot` (an axes-level function for histograms).
  warnings.warn(msg, FutureWarning)
../_images/notebooks_3_EDA_EXFOR_57_1.png

Apparently, we have a lot of data for:

  • Isotopes with protons and neutrons near 26 (~Iron)

  • Neutrons in the 145 range (~Uranium, Plutonium)

  • Protons in the 92 range (~Uranium)

Experimental Campaigns per Year

One of the biggest challenges with this data set is the measurement uncertainties. These are usually reported when creating an entry in EXFOR. However, there are plenty of missing values for uncertainties in terms of both the Energy and the Data. These are suspected to be mostly in old measurements. Let us explore the dataset to find out.

[23]:
df_original = pd.read_csv("../CSV_Files/EXFOR_neutrons/EXFOR_neutrons_ORIGINAL.csv")
C:\Users\Pedro\Anaconda3\envs\tf_nightly\lib\site-packages\IPython\core\interactiveshell.py:3146: DtypeWarning: Columns (17,30,31) have mixed types.Specify dtype option on import or set low_memory=False.
  has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
[209]:
missing_uncertanties_data = len(df_original[df_original["dData"].isnull()]) / len(df_original)
print("{0:.2f}% of the uncertainties in Cross Section are missing".format(missing_uncertanties_data*100))
18.00% of the uncertainties in Cross Section are missing
[211]:
missing_uncertanties_energy = len(df_original[df_original["dEnergy"].isnull()]) / len(df_original)
print("{0:.2f}% of the uncertainties in Energy are missing".format(missing_uncertanties_energy*100))
77.02% of the uncertainties in Energy are missing

Massive amounts of data did not report uncertainties in either Energy or the Cross-Section. The high number of missing values will present a unique challenge if we are to model uncertainties also in multi-output regression.

[24]:
years_with_missing_uncertainties = df_original[df_original['dData'].isnull()].Year
institutes_with_missing_uncertainties = df_original[df_original['dData'].isnull()].Institute
[25]:
years_with_missing_uncertainties_e = df_original[df_original['dEnergy'].isnull()].Year
[28]:
sns.displot(years_with_missing_uncertainties_e, height=6, aspect=3, bins=50)
plt.title("(a)")
[28]:
Text(0.5, 1.0, '(a)')
../_images/notebooks_3_EDA_EXFOR_66_1.png
[212]:
sns.displot(years_with_missing_uncertainties, height=6, aspect=3, bins=50)
plt.title("Missing Values per Year")
plt.savefig(os.path.join(figure_dir, "missing.png"), bbox_inches='tight', dpi=300)
[212]:
Text(0.5, 1.0, 'Missing Values per Year')
../_images/notebooks_3_EDA_EXFOR_67_1.png
[42]:
fig, (ax1, ax2) = plt.subplots(2, figsize=(20,10))
ax1.hist(years_with_missing_uncertainties, bins=50, label="Missing Cross Section Uncertainties")
ax2.hist(years_with_missing_uncertainties_e, bins=50, label="Missing Energy Uncertainties")
ax1.set_ylabel('Count')
ax2.set_ylabel('Count')
ax2.set_xlabel('Year')
ax1.legend()
ax2.legend()
plt.savefig(os.path.join(figure_dir, "xs_e_unc_hist.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_68_0.png

Indeed, most of the measurements perform from the 50s to the 80s simply did not reported uncertainties. What institutes are responsible for the datapoints containing missing uncertainties?

[167]:
institutes_with_missing_uncertainties = institutes_with_missing_uncertainties.value_counts()[:10]
institutes_with_missing_uncertainties = pd.DataFrame(institutes_with_missing_uncertainties).reset_index()
institutes_with_missing_uncertainties.columns = ["Institute", "Count"]
[168]:
institutes_with_missing_uncertainties
[168]:
Institute Count
0 1USAORL 152356
1 1USACOL 149661
2 1USALAS 121619
3 2GERKFK 117092
4 1USARPI 109206
5 2ZZZGEL 73840
6 1USAANL 41487
7 1USADKE 33990
8 1USAMTR 25559
9 2AUSIRK 24367

Interestingly, ORNL-based experimental campaigns failed the most to report uncertainties followed by Columbia University. Should we use the uncertainties feature or should we drop these and focus on the raw data?

Isotope Distributions

What are the highest researched isotopes? Fe-56, Pu-239, U-235, 238U 233U Al-27 Np-237, 91Zr cover the majority of the measurements here.

[9]:
isotope_distribution = pd.DataFrame(df.Isotope.value_counts()).reset_index()
isotope_distribution.columns = ["Isotope", "Count"]
# print(isotope_distribution.to_latex(index=False))
[220]:
isotope_distribution[:10]
[220]:
Isotope Count
0 56Fe 234812
1 239Pu 213238
2 235U 186202
3 233U 132829
4 238U 126862
5 237Np 121401
6 91Zr 102523
7 28Si 96671
8 241Pu 94860
9 207Pb 93372
[12]:
isotope_distribution[-15:].sort_values(by="Isotope")
[12]:
Isotope Count
524 132Cs 1
521 134Ce 1
514 135Ce 1
518 143Ce 1
515 146Pm 1
527 151Pm 1
525 161Gd 1
520 175Yb 1
517 203Hg 1
526 210Pb 1
522 224Ra 1
519 254Cf 1
516 41Ar 1
528 76Br 1
523 88Rb 1

Why do we have so little measurments for many of these isotopes? Many of these are radioactive and difficult to work with experimentally. A machine learning model might be able to extrapolate knowledge.

Nucleus Radius and Neutron-to-Nucleus Radius Ratio

The cross sections is purely a gemoetrical concept. Quantum effects makes the reaction probabilities not be just a function of the size of the nucleus. However, there are some reactions in which the geometrical cross section is a good approximation. Because of this we created the nucleus radius and the neutron-to-nucleus radius ratio features.

[20]:
fig, (ax1, ax2) = plt.subplots(1, 2, figsize=(20,8))
ax1.hist(df.Nucleus_Radius.values, label="Nucleus Radius", bins=20)
ax2.hist(df.Neutron_Nucleus_Radius_Ratio.values, label="N-to-N Ratio", bins=20)
ax1.set_ylabel("Count")
ax1.set_xlabel("femtometer")
ax2.set_xlabel("femtometer")
ax1.legend()
ax2.legend()
plt.savefig(os.path.join(figure_dir, "nucleus_radius_ratio.png"), bbox_inches='tight', dpi=300)
../_images/notebooks_3_EDA_EXFOR_77_0.png

Element Flag (Natural and Isotopic Measurments)

How many datapoints were measrued on pure isotopic samples and how many in natural targets?

[169]:
df.Element_Flag.value_counts()
[169]:
I    3184818
N    1000272
Name: Element_Flag, dtype: int64

EXFOR Status

The EXFOR status is a tag placed by the database managers to indicate if it is approved, outdated, renormalized, correlated, and so on. This is a human tag and will therefore note be included for training to avoid bias.

[113]:
pd.DataFrame(df.EXFOR_Status.value_counts())
[113]:
EXFOR_Status
Other 2165207
Approved_by_Author 1642322
Correlated 231122
Dependent 123446
Preliminary 20306
Outdated 2512
Re_normalized 175

Original

[4]:
df_original.columns
[4]:
Index(['Projectile', 'Target_Metastable_State', 'MF', 'MT',
       'Product_Metastable_State', 'EXFOR_Status', 'Center_of_Mass_Flag',
       'Energy', 'dEnergy', 'Data', 'dData', 'Cos/LO', 'dCos/LO', 'ELV/HL',
       'dELV/HL', 'I78', 'Short_Reference', 'EXFOR_Accession_Number',
       'EXFOR_SubAccession_Number', 'EXFOR_Pointer', 'Z', 'A', 'N',
       'Reaction_Notation', 'Title', 'Year', 'Author', 'Institute', 'Date',
       'Reference', 'Dataset_Number', 'EXFOR_Entry', 'Reference_Code',
       'Projectile_Z', 'Projectile_A', 'Projectile_N', 'Isotope', 'Element'],
      dtype='object')
[ ]: