Data Engineering
Spring 2023
Schedule: Tue/Thu 1:00pm-2:30pm
Location: TBD
Instructor: Sundong, Kim (sundong@gist.ac.kr)
Office: GIST AI Graduate School (S7) Room 204
Office Hour: Thu 2:30pm-3:30pm or by appointment
TAs: TBD
Introduction
Machine learning systems are both complex and unique. It is complex because they consist of many different components and involve many different stakeholders, and it is unique because they’re data dependent, with data varying wildly from one use case to the next. In this lecture, you’ll learn how to conduct data engineering, and hoslistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.
During the lecture, we will consider each design decision, such as how to process and create training data, which features to use, how often to retrain models, and what to monitor in the context of how it can help your system as a whole achieve its objectives. The iterative framework will be explained through actual case studies.
Overall, this lecture will gain you some insight on how to help you tackle scenarios such as:
- Engineering data and choosing the right metrics to solve a business problem
- Automating the process for continually developing, evaluating, deploying, and updating models
- Developing responsible ML systems
Textbook & References
Gradings
- Homeworks and tests: 50%, Project 50%
Misc
- Recommended to have basic knowledge on data structure and algorithms
- Recommended to have basic knowledges on machine learning.
- There will be some synergy by taking this course with project-based AI course and it will be beneficial by taking this course before joining any industry internship.
Schedule
Herebelow, you can find the tentative syllabus of the course.
Date | Topic | Materials | Homework |
---|---|---|---|
02-28 | Introduction; Understanding machine learning production | ||
03-02 | Understanding machine learning production (cont'd) | ||
03-07 | Data engineering and fundamentals | ||
03-09 | Data engineering and fundamentals | ||
03-14 | Training data and feature engineering | ||
03-16 | Training data and feature engineering | ||
03-21 | Model selection, development and training | ||
03-23 | Model selection, development and training | ||
03-28 | Offline model evaluation | ||
03-30 | Offline model evaluation | ||
04-04 | Deployment | ||
04-06 | Deployment | ||
04-11 | Integrating into business: Listening from industry experts | ||
04-13 | Integrating into business: Listening from industry experts | ||
04-18 | Midterm | ||
04-20 | No Lecture (Midterm Period) | ||
04-25 | Final project announcement | ||
04-27 | Final project announcement | ||
05-02 | Diagnosis of system failures | ||
05-04 | Diagnosis of system failures | ||
05-09 | Data distribution shifts and monitoring | ||
05-11 | Data distribution shifts and monitoring | ||
05-16 | Infrastructure and platform | ||
05-18 | Infrastructure and platform | ||
05-23 | Beyond accuracy: Fairness, security, governance | ||
05-25 | Beyond accuracy: Fairness, security, governance | ||
05-30 | Integrating into business: Listening from industry experts II | ||
06-01 | Integrating into business: Listening from industry experts II | ||
06-06 | No Lecture (National Holiday) | ||
06-08 | Project demo day | ||
06-14 | Final Exam | ||
06-16 | No Lecture (Final Exam Period) |