Statistician & Data Scientist — portfolio website. Curated selection of projects in causal inference, predictive modeling, and applied research. Hosted on GitHub Pages.
Statistician | Data Scientist | Research & Innovation
Welcome to my professional portfolio.
I am a statistician with a strong academic background in mathematics and a specialization in applied statistics.
I hold a Master’s degree focused on causal modeling and its applications in health research.
This portfolio presents a curated selection of my academic, research, and personal projects involving data analysis, statistical modeling, and causal inference.
It also includes an overview of my technical skills, certifications, and key achievements.
My work reflects a commitment to rigorous methodology, innovation, and the practical use of data to support decision-making-particularly in public health and research contexts.
Feel free to explore the repository and contact me for collaboration or opportunities.
Predictive analytics and machine learning
Project Overview. I engineered a high-performance predictive tool designed to quantify the probabilities of match outcomes for the French Ligue 1 McDonald’s. By moving beyond traditional intuition-based analysis, this project leverages a data-driven approach to forecast results for the current 18-club elite season.
The Technical Edge. The core of the application is a multinomial logistic Regression model developed with Scikit-Learn. It processes complex team metrics—such as offensive efficiency and defensive resilience—to output precise winning, drawing, and losing percentages. The interface is a custom-built Streamlit dashboard, optimized for both light and dark modes with professional branding.
**Key Competencies:
Domain: Biostatistics / Causal Modeling / Epigenetics
This project addresses a major challenge in multiple mediation analysis: estimating effects accurately when mediators are correlated due to an unmeasured common cause. I applied and evaluated advanced statistical methods to analyze the influence of childhood trauma on cortisol stress reactivity via DNA methylation. The implementation of this novel approach successfully corrected for confounding bias and confirmed the robustness of a significant direct causal effect.
Key Skills: Causal inference, advanced statistical modeling, R programming.
A clean and well-structured Jupyter Notebook project that predicts users at high risk of churn using machine learning. The workflow covers data preprocessing, feature engineering, and model training with Random Forest and XGBoost. A dynamic threshold optimization approach is applied to balance precision and recall, maximizing the F1-score. Model interpretability is addressed through feature importance analysis of the best-performing model.
This academic project delves into a comparative study of regression trees, Bagging, and Random Forests, focusing on their prediction accuracy (MSE) and stability. It rigorously demonstrates how ensemble methods effectively overcome the inherent instability of single regression trees to construct more robust and reliable predictive models.
The central challenge addressed is the reduction of regression tree models’ sensitivity to data variations and the enhancement of their overall performance through aggregation techniques. This work showcases robust technical skills in predictive modeling (Regression Trees, Bagging, Random Forests), model evaluation (MSE, stability analysis), Monte Carlo simulation, and real-world data application, all implemented in R.
Key findings consistently demonstrate that Random Forests emerged as the most performant and stable method, significantly reducing the mean squared error and enhancing prediction robustness compared to individual regression trees. The technologies employed include the R programming language and the rpart, randomForest, tidyverse (for dplyr and ggplot2), and ipred packages.
A custom R package I developed for modeling and simulating mixture distributions. MixLaw provides a robust framework to generate observations, define density functions, and compute essential statistics such as means and quantiles for complex mixtures of various underlying distributions. This tool simplifies the handling of heterogeneous data structures in statistical modeling.