Avatar

Freddy Alfonso Boulton

Software Engineer and Data Scientist

About Me

I’m a software engineer specializing in the intersection of open source software and artificial intelligence. I’m currently at Hugging Face , where I spend most of my time developing Gradio , an open-source python package for building AI-powered web applications with little code.

Previously, I worked at the Alteryx Innovation Labs where I was a lead engineer on EvalML , a python package for automated machine learning (AutoML). I got my start in software at Motional , developing tools to train deep learning models of driver behavior for autonomous vehicles. Out of college, I worked as a data scientist at Nielsen leveraging thousands of hours of television viewing data to build machine learning models to calculate television ratings.

Interests

  • Open Source ML software
  • Web Development
  • Machine Learning
  • Trivia

Education

  • M.S. Statistics, 2016

    The University of Chicago

  • B.A. Statistics, 2016

    The University of Chicago

Experience

 
 
 
 
 

Software Engineer

Hugging Face 🤗

Jul 2022 – Present Remote
Technical Leadership: Played a key role in Gradio’s growth from 30k to 900k monthly active developers by contributing to all parts of Gradio’s stack: backend server, frontend, client libraries, continuous integration. I co-lead the development of Gradio’s custom components, a full-stack toolchain and API for developers to create their own Gradio components and include them in any Gradio application.
 
 
 
 
 

Senior Software Engineer

Alteryx Innovation Labs

Jun 2020 – Jul 2022 Boston, MA

Technical Leadership: Led the planning and implementation of several key features of EvalML, a python package for automated machine learning (AutoML). These include parallel computation of machine learning pipelines, time series modelling, and explaining predictions of black-box models with SHAP.

Building an Open Source Community: Actively review proposed changes and answer questions from the open source users of EvalML. To date, this amounts to reviewing 500 Pull Requests in the year in since I joined as well as filing 150 issues for bugs and improvements. Primary creator and maintainer of EvalML’s conda package, which is the primary way open source users can install our package on Windows.

DevOps: Sped up EvalML’s CI testing by a factor of 5 (50 minutes to 10 minutes) by speeding up the 100 slowest tests that contributed to 98% of the total runtime and by reconfiguring our GitHub Actions testing pipeline to run tests in parallel. Also created a per-commit build and test of our conda package to identify regressions before new versions of EvalML are released.

 
 
 
 
 

Software Engineer

Aptiv Autonomous Mobility

Aug 2018 – Jun 2020 Boston, MA

Data Engineering: Wrote software library for efficiently training deep learning models on 60 hours (200 GB) of log data collected from autonomous vehicles. Engineered a data pipeline for parsing data from vehicle logs, preprocessing data, and storing in MongoDB database. Use of my software was instrumental in releasing CoverNet, a novel deep learning algorithm for predicting trajectories of vehicles with 40% improvement over state of the art.

Deep Learning Research: Researched convolutional neural network architectures for predicting future trajectories of vehicles from log data collected from out fleet of autonomous vehicles.

MapManager: Developed a python package for efficiently manipulating map data without consuming all the available RAM. With this package, it is possible to train deep learning models that require map data on datasets that don’t fit in memory.

LaneTarget Estimator: Developed a Random Forest model for predicting the lane a vehicle will take in an intersection. Showed 11% improvement in performance over the existing method.

 
 
 
 
 

Emerging Technologist - Data Science Track

Nielsen

Jul 2016 – Jul 2018 Chicago, IL

Television Station Clusters: Used t-SNE and K-Means algorithm to create television station clusters for use in Nielsen ratings calculations. Use of these clusters improves ratings accuracy by 8%. Created an interactive dashboard in Shiny to visualize clusters and present to clients.

Household Demographic Prediction: Helped develop a Recurrent Neural Network to predict household demographics based on cable set-top box data. Researched how to correct model predictions to match census estimates with mixed integer programming. Productionalized model training code to scale to millions of homes.

DVD Sales Prediction: Analyzed DVD sales data and developed a random forest model to predict future sales within 15% relative error. Designed an interface with R Shiny to allow stakeholders to make predictions.

Recent Publications

How to troubleshoot memory problems in Python

I provide a case study of how to debug memory problems in python using open source tools. Covered in the Real Python Podcast!

Motion Prediction using Trajectory Sets and Self-Driving Domain Knowledge

We extend CoverNet with novel loss functions to better capture geometric relationships in the trajectory set.

CoverNet: Multimodal Behavior Prediction using Trajectory Sets

We present CoverNet, a new method for multimodal, probabilistic trajectory prediction for urban driving.

Skills & Interests

Python

  • Analyzing and visualizing data with the python scientific stack (pandas, numpy, matplotlib)
  • Building machine learning models with PyTorch and scikit-learn
  • Using object-oriented principles to write software.

Functional Programming

  • Learning functional programming in Scala through online courses
  • Goal is apply functional programming principles to build reliable and scalable data science applications.

Spark

  • Parallelizing python scode with the RDD API
  • Parallelizing data analysis with the DataFrames API

Online Courses

Scala And Functional Programming for Beginners

See certificate

Deep Learning NanoDegree

See certificate

Recent Posts

Takeaways from That Will Never Work

What I learned reading the memoir by Marc Randolph, Netflix’s first CEO.

Why the Order Boolean Expressions Matters in Python

if True or False could be slower than if False or True!

How to troubleshoot memory problems in Python

I provide a case study of how to debug memory problems in python using open source tools.