Installation Guide¶
NucML uses Numpy Arrays and Pandas DataFrame’s as the main data objects. Each step of the evaluation pipeline requires different utilities including parsing utilities, data processing tools, and Machine Learning packages including TensorFlow, XGBoost, and Scikit-learn.
1. Install NucML and Dependencies¶
NucML is a python toolbox and can be installed using pip
along with all the needed dependencies. It is recommended that you create
a conda
environment and then install nucml
. You can download conda
here.
Once downloaded you can write the following commands on your anaconda shell:
Warning
Since TensorFlow only supports Python versions up to 3.8, NucML must be install in an enviornment with Python Version =< 3.8.
# create and activate conda environment
conda create -n ml_nuclear_env python=3.8
conda activate ml_nuclear_env
# Make sure the python version is not higher than 3.8
python -V
# install nucml and tensorflow docs
pip install nucml
pip install git+https://github.com/tensorflow/docs
Installing Dependencies¶
Before moving foward, you need to install both XGBoost
and TensorFlow
. These are not provided by default
as a dependency in NucML. We leave these two packages out in case the user already has or plans to install both XGBoost and
TensorFlow with GPU support. Please follow the instructions in the packages documentation for installation instructions.
If you do not care about GPU support and just want to get started with NucML
, feel free to install both packages using:
# install nucml
pip install tensorflow xgboost
2. Configure NucML and Generate the EXFOR Datasets¶
Before exploring any functionalities, all EXFOR datasets need to be generated. NucML is built around a defined working directory. This
has been uploaded to GitHub as a repository for you to download. It contains all metadata files from all various datasources including
ACE, ENDF, EXFOR, and ENSDF. The ML_Nuclear_Data
repository also contains some pre-generated datasets and everything needed to
produce the heavier EXFOR-based datasets.
First, clone the repository by either downloading and unzipping it directly from GitHub or by using the command line.
# navigate to the directory of your choice - change it to your own
cd /Users/pedrovicentevaldez/Desktop/
# clone the ml nuclear data repository
git clone https://github.com/pedrojrv/ML_Nuclear_Data.git
In the rest of the setup it is assumeed you cloned the repository to your Desktop.
Note
Feel free to rename the ML_Nuclear_Data
directory now rather than later as it can cause some issues.
Before proceeding, it is important that you have a local copy of the ACE files utilized by SERPENT2. If you do not have one, feel free to
download them by here. Once the download is done, place it in a location
of your choice. Here, we assume you unzipped and moved the acedata
directory into the ML_Nuclear_Data
directory.
NucML makes use of a configuration file to get the paths to the generated data based on the working directory structure. Having
downloaded the repository, it is time to tell NucML
where is it located. Write the following commands in your terminal:
# activate your conda environment where nucml is installed
conda activate ml_nuclear_env
# navigate to the cloned repo
cd /Users/pedrovicentevaldez/Desktop/ML_Nuclear_Data
# configure nucml paths - if you do not have matlab feel free to delete that entire argument
python -c "import nucml.configure as config; config.configure('.', 'acedata/', matlab_exe_path='/mnt/c/Program\ Files/MATLAB/R2019a/bin/matlab.exe')"
Now, we are ready to generate the EXFOR datasets. There are two options: (1) Download the already parsed/formated files or
(2) generate them yourself using the provided utility python script. We recommend the first approach. You can download the needed data
here. Unzip and substitute the CSV_Files directory
in the ML_Nuclear_Data/EXFOR/
directory (feel free to delete the previous folder if taking the first approach). If for some reason
you would like to re-parse the EXFOR C4 Files write the following commands in the terminal:
Note
This will generate EXFOR datasets for all avaliable projectiles and will therefore take a couple of minutes. Only do this if you opted out for the second approach, otherwise ignore.
# navigate to the cloned repo
cd /Users/pedrovicentevaldez/Desktop/ML_Nuclear_Data
# generate the exfor datasets
python generate_exfor.py
Running this script will create a CSV_Files
directory within the EXFOR
folder. Additionally, a tmp directory will also created containing
temporary files used in the creation of the final datasets. Feel free to delete the tmp
directory after the process has finish to save space.
3. Other Dependencies¶
SERPENT2 and MATLAB must be installed if you want to validate your models using criticality benchmarks. These are not necessary for other tasks such as loading the data and training ML models.