Avatar

Freddy Alfonso Boulton

Data Scientist & Software Engineer

About Me

I’m a senior software engineer at the Alteryx Innovation Labs where I am a primary contributor to EvalML , a python package for automated machine learning (AutoML). I believe open source tools are needed to make breakthroughs in data science.

Previously, I was a software engineer at Aptiv , where I built tools to train deep learning models of driver behavior for autonomous vehicles. Prior to Aptiv, I worked as a data scientist at Nielsen leveraging thousands of hours of television viewing data to build machine learning models to calculate television ratings.

Interests

  • Open Source data science software
  • Machine Learning
  • Analyzing Data at Scale
  • Probabilistic Programming
  • Trivia

Education

  • M.S. Statistics, 2016

    The University of Chicago

  • B.A. Statistics, 2016

    The University of Chicago

Experience

 
 
 
 
 

Senior Software Engineer

Alteryx Innovation Labs

Jun 2020 – Present Boston, MA

Technical Leadership: Led the planning and implementation of several key features of EvalML, a python package for automated machine learning (AutoML). These include parallel computation of machine learning pipelines, time series modelling, and explaining predictions of black-box models with SHAP.

Building an Open Source Community: Actively review proposed changes and answer questions from the open source users of EvalML. To date, this amounts to reviewing 500 Pull Requests in the year in since I joined as well as filing 150 issues for bugs and improvements. Primary creator and maintainer of EvalML’s conda package, which is the primary way open source users can install our package on Windows.

DevOps: Sped up EvalML’s CI testing by a factor of 5 (50 minutes to 10 minutes) by speeding up the 100 slowest tests that contributed to 98% of the total runtime and by reconfiguring our GitHub Actions testing pipeline to run tests in parallel. Also created a per-commit build and test of our conda package to identify regressions before new versions of EvalML are released.

 
 
 
 
 

Software Engineer

Aptiv Autonomous Mobility

Aug 2018 – Jun 2020 Boston, MA

Data Engineering: Wrote software library for efficiently training deep learning models on 60 hours (200 GB) of log data collected from autonomous vehicles. Engineered a data pipeline for parsing data from vehicle logs, preprocessing data, and storing in MongoDB database. Use of my software was instrumental in releasing CoverNet, a novel deep learning algorithm for predicting trajectories of vehicles with 40% improvement over state of the art.

Deep Learning Research: Researched convolutional neural network architectures for predicting future trajectories of vehicles from log data collected from out fleet of autonomous vehicles.

MapManager: Developed a python package for efficiently manipulating map data without consuming all the available RAM. With this package, it is possible to train deep learning models that require map data on datasets that don’t fit in memory.

LaneTarget Estimator: Developed a Random Forest model for predicting the lane a vehicle will take in an intersection. Showed 11% improvement in performance over the existing method.

 
 
 
 
 

Emerging Technologist - Data Science Track

Nielsen

Jul 2016 – Jul 2018 Chicago, IL

Television Station Clusters: Used t-SNE and K-Means algorithm to create television station clusters for use in Nielsen ratings calculations. Use of these clusters improves ratings accuracy by 8%. Created an interactive dashboard in Shiny to visualize clusters and present to clients.

Household Demographic Prediction: Helped develop a Recurrent Neural Network to predict household demographics based on cable set-top box data. Researched how to correct model predictions to match census estimates with mixed integer programming. Productionalized model training code to scale to millions of homes.

DVD Sales Prediction: Analyzed DVD sales data and developed a random forest model to predict future sales within 15% relative error. Designed an interface with R Shiny to allow stakeholders to make predictions.

Recent Publications

How to troubleshoot memory problems in Python

I provide a case study of how to debug memory problems in python using open source tools. Covered in the Real Python Podcast!

Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge

We extend CoverNet with novel loss functions to better capture geometric relationships in the trajectory set.

CoverNet: Multimodal Behavior Prediction using Trajectory Sets

We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving.

Skills & Interests

Python

  • Analyzing and visualizing data with the python scientific stack (pandas, numpy, matplotlib)
  • Building machine learning models with PyTorch and scikit-learn
  • Using object-oriented principles to write software.

Functional Programming

  • Learning functional programming in Scala through online courses
  • Goal is apply functional programming principles to build reliable and scalable data science applications.

Spark

  • Parallelizing python scode with the RDD API
  • Parallelizing data analysis with the DataFrames API

Online Courses

Scala And Functional Programming for Beginners

See certificate

Deep Learning NanoDegree

See certificate

Recent Posts

Takeaways from That Will Never Work

What I learned reading the memoir by Marc Randolph, Netflix’s first CEO.

Why the Order Boolean Expressions Matters in Python

if True or False could be slower than if False or True!

How to troubleshoot memory problems in Python

I provide a case study of how to debug memory problems in python using open source tools.