This course explores the foundations of big data and data analytics, including its foundations in computing technology and statistics. It explores the nature of underlying technical challenges and statistical assumptions used to understand relationships in a variety of applied fields, with a focus on the fields of fraud detection and communication monitoring. Engages with the social implications of increased knowledge, surveillance, and behavioral prediction made possible by big data, and the ethical tradeoffs faced. While the course includes an analytics project, no prior technical experience is required.
Once completed, you should have the following capabilities:
- Familiarity and exposure
- Demonstrate familiarity with hardware trends underlying the rise of big data
- Demonstrate familiarity with software trends underlying the rise of big data
- List specific links between big data technologies that affect our security as a society
- Reasoning about computing technology and about models
- Articulate a strategy for defining and algorithmic finding a specific type of wrongdoing
- Identify problems that technologies will likely solve in the future
- Identify problems that technologies likely can't solve in the future
- Technical execution of code
- Use R programming language to perform statistical analysis
- Use Python to find the most common words in a book
- Use Python to query an email database
- Use Python to algorithmically identify emails of interest
Spring 2021 syllabus (PDF)
Note: Sample syllabi are provided for informational purposes only. For the most up-to-date information, consult the official course documentation.
Before Taking This Class...
Suggested Background Knowledge
This course requires no specific background knowledge. Many students with no programming background and no background in statistics have found this class approachable and learned a lot. At the same time, upper level PhD students in economics and undergrad/masters students in a range of subjects at Georgia Tech who have modest to considerable computing skills have also benefited from the wide-ranging survey of data analytic methods, the open ended project work, and the discussions around how data analytics fit into problems related to security and society.
Technical Requirements and Software
We do some exercises in R and Python. We host Jupyter notebooks for the students for Python, and we’ve had students run their own R/RStudio installations. The course project is to find “wrongdoing” among the 250k Enron emails.
All Georgia Tech students are expected to uphold the Georgia Tech Academic Honor Code. This course may impose additional academic integrity stipulations; consult the official course documentation for more information.