I’m a software engineer specializing in the intersection of open source software and artificial intelligence. I’m currently at Hugging Face , where I spend most of my time developing Gradio , an open-source python package for building AI-powered web applications with little code.
Previously, I worked at the Alteryx Innovation Labs where I was a lead engineer on EvalML , a python package for automated machine learning (AutoML). I got my start in software at Motional , developing tools to train deep learning models of driver behavior for autonomous vehicles. Out of college, I worked as a data scientist at Nielsen leveraging thousands of hours of television viewing data to build machine learning models to calculate television ratings.
M.S. Statistics, 2016
The University of Chicago
B.A. Statistics, 2016
The University of Chicago
Technical Leadership: Led the planning and implementation of several key features of EvalML, a python package for automated machine learning (AutoML). These include parallel computation of machine learning pipelines, time series modelling, and explaining predictions of black-box models with SHAP.
Building an Open Source Community: Actively review proposed changes and answer questions from the open source users of EvalML. To date, this amounts to reviewing 500 Pull Requests in the year in since I joined as well as filing 150 issues for bugs and improvements. Primary creator and maintainer of EvalML’s conda package, which is the primary way open source users can install our package on Windows.
DevOps: Sped up EvalML’s CI testing by a factor of 5 (50 minutes to 10 minutes) by speeding up the 100 slowest tests that contributed to 98% of the total runtime and by reconfiguring our GitHub Actions testing pipeline to run tests in parallel. Also created a per-commit build and test of our conda package to identify regressions before new versions of EvalML are released.
Data Engineering: Wrote software library for efficiently training deep learning models on 60 hours (200 GB) of log data collected from autonomous vehicles. Engineered a data pipeline for parsing data from vehicle logs, preprocessing data, and storing in MongoDB database. Use of my software was instrumental in releasing CoverNet, a novel deep learning algorithm for predicting trajectories of vehicles with 40% improvement over state of the art.
Deep Learning Research: Researched convolutional neural network architectures for predicting future trajectories of vehicles from log data collected from out fleet of autonomous vehicles.
MapManager: Developed a python package for efficiently manipulating map data without consuming all the available RAM. With this package, it is possible to train deep learning models that require map data on datasets that don’t fit in memory.
LaneTarget Estimator: Developed a Random Forest model for predicting the lane a vehicle will take in an intersection. Showed 11% improvement in performance over the existing method.
Television Station Clusters: Used t-SNE and K-Means algorithm to create television station clusters for use in Nielsen ratings calculations. Use of these clusters improves ratings accuracy by 8%. Created an interactive dashboard in Shiny to visualize clusters and present to clients.
Household Demographic Prediction: Helped develop a Recurrent Neural Network to predict household demographics based on cable set-top box data. Researched how to correct model predictions to match census estimates with mixed integer programming. Productionalized model training code to scale to millions of homes.
DVD Sales Prediction: Analyzed DVD sales data and developed a random forest model to predict future sales within 15% relative error. Designed an interface with R Shiny to allow stakeholders to make predictions.
if True or False
could be slower than if False or True
!