Practical Exploratory Data Analysis for Machine Learning
Room 324
Presenter: Ananda Ribeiro
Modality: Workshop
Abstract
The workshop will consist of a hands-on session showing how to leverage exploratory data analysis to inform Machine Learning model development. Participants will learn how to systematically explore a dataset, identify patterns, and extract insights that guide model design and feature engineering. The motivation for this workshop came from a professional project in which a simple clustering model achieved strong results through the creation of relevant features derived from exploratory analysis. The session will begin with a brief overview of EDA concepts and commonly used techniques, followed by a guided notebook exercise using Python and pandas. Participants will have the opportunity to do their own data explorations, create new features, and evaluate their impact on the model. The workshop is intended for OMSCS students, especially those with limited experience in data analytics. Participants should be familiar with Python and pandas and bring a laptop. Main topics include explaining what an EDA is, the importance of exploring the data before model building, EDA techniques, and a brief overview of types of visualizations and statistical summaries that can be used. Then, it will connect EDA generated insights to Machine Learning, focusing on feature engineering while also emphasizing how they help with data preprocessing and validating assumptions. At the end of the workshop, participants will have a practical workflow for conducting EDA, which can be useful in different projects. And they will learn how to apply EDA results to build more effective machine learning models.
Program
Check out the Program page for the full program!
Questions About the Conference?
Check out our FAQ page for answers and contact information!