Projects
EXFOR SQL
Modernizing the EXFOR Database using Google BigQuery
SQL · Google Cloud Platform
As part of the ongoing nuclear data modernization effort, I am happy to release EXFOR SQL, a modernized SQL version of the
EXFOR database hosted on Google BigQuery for easy data analysis and extraction. In addition to EXFOR, data from the
Atomic Mass Evaluation, and the Evaluated Nuclear Structure Data File is also included.
It is a public dataset meaning you can access it using the table id:
ml-nuclear-data:nuclear_data.exfor
NucML
Python Pipeline for ML-based Nuclear Data Solutions
Python · TensorFlow · XGBoost · Scikit-Learn · PyDoc
NucML is the first and only end-to-end python-based supervised machine learning pipeline for enhanced bias-free nuclear data generation and evaluation to support the advancement of next-generation nuclear systems. It offers capabilities that allow the user to navigate through each step of the ML-based nuclear data cross section evaluation pipeline. These steps include dataset parsing and compilation of reaction data, exploratory data analysis, data manipulation and feature engineering, model training and evaluation, and validation via criticality benchmarks. Current research revealed data analysis, evaluation, and benchmarking iteration rates are 2x-20x faster than traditional theoretical methods using NucML’s workflow pipeline. As a result of my doctorate research, several ready-to-use ML datasets and pre-trained models are also included. These models have outperformed established methods by up to 170% in the U-233 Jezebel benchmark and by 140% in the prediction of newly measured Cl-35 datapoints.
Milvus Utilities
Python Utilities for Image Retrieval with Milvus
Python · Docker · TensorFlow · Milvus · PyDoc · MySQL
The repository shows an example of how to integrate a TensorFlow model for embedding generation and storing with Milvus, an open-source vector similarity search engine. It also contains a set of python-based utility functions created to more easily keep track of processed images, extract and render nearest neighbor files locally, and download them locally if needed.
San Francisco Crime and Incident Reporter
A Small ML-powered SFPD Crime Rate Visualization Website
Python JavaScript Front-end · Scikit-Learn · ML Model Training · Visualization Tools · ChartJS · Database Querying
First created as part of the visualization course of the IBM Data Science Specialization, this website was further developed to provide information on incidents and crime rates in various districts of the San Francisco area. Based on Python and JavaScript, the website offers various graphics and even ML-powered crime rate predictions. Several models including Decision Trees, K-Nearest-Neighbors, and Support Vector Machines were trained and optimized using grid search with cross-validation. The KNN model worked best with a Mean Absolute Error of approximately 7.6 crimes per day. The web app refreshes daily by querying and processing the latest data from the official SFPD database. Visit the website.
Radon - Chatbot (Beta)
The Nuclear Engineering Department at UC Berkeley Chatbot
IBM Cloud · Watson Assistant · Chatbot Design · WordPress
This chatbot was at first developed as part of the IBM Applied AI Professional Certificate Capstone Project. It was further developed into the official chatbot for the Nuclear Engineering Department at UC Berkeley. A preliminary version of Radon was deployed to come up with a refined set of intents. Relevant entities were created as needed and a complex dialog flow was designed to answer questions ranging from faculty and student contact information, graduate resources, course information, and diversity and inclusion pointers. Visit the chatbot at nuc.berkeley.edu. Ask "who is Pedro?"
MIMOSAS
ML Supervised Pipeline for Nuclear Security
Python · TensorFlow · Scikit-Learn · DNN · Random Forest
MIMOSAS (Multimodal Input Model Output Security Analysis Suite) is a supervised machine learning pipeline developed for the classification of multimodal data to inform nuclear security and proliferation detection scenarios. MIMOSAS provides an end-to-end data processing workflow, from data ingestion and pre-processing to model training and test set classification. The pipeline is specified via an input deck, making workflow customization effortless, and the framework is modular allowing for the easy addition of new learning algorithms. Learn more at complexity.berkeley.edu.
Foursquare-based City Comparison
Foursquare-powered City Clustering and Recommendations
Python · Foursquare API · Unsupervised Learning · Folium · Scikit-Learn · K-means Clustering
This project was at first developed as part of the IBM Data Science Professional Certificate Capstone Project. It was further developed into a single utility function which provides an easy way for a user to compare three or more cities using KMeans clustering with data provided by Foursquare. The project's GitHub repository provides several examples and tutorials. In the main example, major tech hub cities including San Francisco, Chicago, and Boston are compared based on the most popular venues.
Business Email Classifier
A Proof-of-Concept Business-oriented e-mail Classifier
Python · NLP · TensorFlow · LSTM · K-Means · PCA
This team project was developed as part of UC Berkeley's Data Mining and Analytics course. Using the ENRON Email Dataset, a machine learning model was trained for Multi-Class Email Text Classification. Due to computational limitations, the Google News pre-trained word embedding model was used to transform the email subjects (labels). Using clustering and dimensionality reduction techniques, a small set of discrete labels were created which were subsequently used to train a small LSTM TensorFlow model. The model is far from perfect, but it demonstrates the core workflow when dealing with NLP challenges.
Teachable Machine
User-Friendly Transfer Learning for Non-ML Learners
JavaScript Front-end · TensorFlow.JS
This TensorFlow.JS-based website created for educational purposes allows non-ML users to experiment with model training through their webcam. Largely inspired by Google's Teachable Machine, it allows the user to not only learn about transfer learning but also play with more model hyperparameters. Various models can be trained during the same session using several combinations of parameters and dataset sizes. Performance metrics can be compared using the TF Visor menu which also allows for individual model inspection and live training visualizations.
SCALE PyTools
A Python Pipeline for Accelerated MSR Experimentation
Python · MATLAB · SCALE · Serpent2 · Monte-Carlo
Aimed at tackling tedious data extraction processes, SCALE PyTools is a python-based project designed to allow for faster experimental iterations of reactivity and fission product removal simulations of advance nuclear reactor using SCALE, a comprehensive modeling and simulation suite for nuclear safety analysis. More specifically, the utilities were designed to generate and run fission product removal simulations of the off-gas system in molten salt reactors. This utility toolbox was used to study SCALE's new TRITON module.
NH-RES Capstone Project
A Low-cost High-profit Hybrid Energy System
MATLAB
To address the decrease in grid stability brought on by the increasing penetration of renewable technologies, the ability to “load follow”, to match the energy output to the demand, is pivotal to help prevent overgeneration and oversupplying of electricity. To tackle this, a Nuclear-Renewable Hybrid Energy System (NR-HES) was suggested. The proposed system is comprised of a 1 GWe molten salt reactor, renewable energy sources, and a 34 ton/day 4-step CuCl Hydrogen Production Plant. By optimizing the operation between the NR-HES components, hydrogen production costs were brought down to $1.66/kg, 32% below the Department of Energy target for 2020 of $2.30/kg. The low production costs coupled with potential profits resulted in a 40% reduction in the nuclear power plant's levelized cost of electricity ($0.0199/kWh).