Installation Guide¶

NucML uses Numpy Arrays and Pandas DataFrame’s as the main data objects. Each step of the evaluation pipeline requires different utilities including parsing utilities, data processing tools, and Machine Learning packages including TensorFlow, XGBoost, and Scikit-learn.

1. Install NucML and Dependencies¶

NucML is a python toolbox and can be installed using pip along with all the needed dependencies. It is recommended that you create a conda environment and then install nucml. You can download conda here. Once downloaded you can write the following commands on your anaconda shell:

Warning

Since TensorFlow only supports Python versions up to 3.8, NucML must be install in an enviornment with Python Version =< 3.8.

# create and activate conda environment
conda create -n ml_nuclear_env python=3.8
conda activate ml_nuclear_env

# Make sure the python version is not higher than 3.8
python -V

# install nucml and tensorflow docs
pip install nucml
pip install git+https://github.com/tensorflow/docs

Installing Dependencies¶

Before moving foward, you need to install both XGBoost and TensorFlow. These are not provided by default as a dependency in NucML. We leave these two packages out in case the user already has or plans to install both XGBoost and TensorFlow with GPU support. Please follow the instructions in the packages documentation for installation instructions. If you do not care about GPU support and just want to get started with NucML, feel free to install both packages using:

# install nucml
pip install tensorflow xgboost

2. Configure NucML and Generate the EXFOR Datasets¶

Before exploring any functionalities, all EXFOR datasets need to be generated. NucML is built around a defined working directory. This has been uploaded to GitHub as a repository for you to download. It contains all metadata files from all various datasources including ACE, ENDF, EXFOR, and ENSDF. The ML_Nuclear_Data repository also contains some pre-generated datasets and everything needed to produce the heavier EXFOR-based datasets.

First, clone the repository by either downloading and unzipping it directly from GitHub or by using the command line.

# navigate to the directory of your choice - change it to your own
cd /Users/pedrovicentevaldez/Desktop/

# clone the ml nuclear data repository
git clone https://github.com/pedrojrv/ML_Nuclear_Data.git

In the rest of the setup it is assumeed you cloned the repository to your Desktop.

Note

Feel free to rename the ML_Nuclear_Data directory now rather than later as it can cause some issues.

Before proceeding, it is important that you have a local copy of the ACE files utilized by SERPENT2. If you do not have one, feel free to download them by here. Once the download is done, place it in a location of your choice. Here, we assume you unzipped and moved the acedata directory into the ML_Nuclear_Data directory.

NucML makes use of a configuration file to get the paths to the generated data based on the working directory structure. Having downloaded the repository, it is time to tell NucML where is it located. Write the following commands in your terminal:

# activate your conda environment where nucml is installed
conda activate ml_nuclear_env

# navigate to the cloned repo
cd /Users/pedrovicentevaldez/Desktop/ML_Nuclear_Data

# configure nucml paths - if you do not have matlab feel free to delete that entire argument
python -c "import nucml.configure as config; config.configure('.', 'acedata/', matlab_exe_path='/mnt/c/Program\ Files/MATLAB/R2019a/bin/matlab.exe')"

Now, we are ready to generate the EXFOR datasets. There are two options: (1) Download the already parsed/formated files or (2) generate them yourself using the provided utility python script. We recommend the first approach. You can download the needed data here. Unzip and substitute the CSV_Files directory in the ML_Nuclear_Data/EXFOR/ directory (feel free to delete the previous folder if taking the first approach). If for some reason you would like to re-parse the EXFOR C4 Files write the following commands in the terminal:

Note

This will generate EXFOR datasets for all avaliable projectiles and will therefore take a couple of minutes. Only do this if you opted out for the second approach, otherwise ignore.

# navigate to the cloned repo
cd /Users/pedrovicentevaldez/Desktop/ML_Nuclear_Data

# generate the exfor datasets
python generate_exfor.py

Running this script will create a CSV_Files directory within the EXFOR folder. Additionally, a tmp directory will also created containing temporary files used in the creation of the final datasets. Feel free to delete the tmp directory after the process has finish to save space.

3. Other Dependencies¶

SERPENT2 and MATLAB must be installed if you want to validate your models using criticality benchmarks. These are not necessary for other tasks such as loading the data and training ML models.