Data Engineering

Spring 2023

Schedule: Tue/Thu 1:00pm-2:30pm
Location: TBD

Instructor: Sundong, Kim (sundong@gist.ac.kr)
Office: GIST AI Graduate School (S7) Room 204
Office Hour: Thu 2:30pm-3:30pm or by appointment

TAs: TBD

Introduction

Machine learning systems are both complex and unique. It is complex because they consist of many different components and involve many different stakeholders, and it is unique because they’re data dependent, with data varying wildly from one use case to the next. In this lecture, you’ll learn how to conduct data engineering, and hoslistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

During the lecture, we will consider each design decision, such as how to process and create training data, which features to use, how often to retrain models, and what to monitor in the context of how it can help your system as a whole achieve its objectives. The iterative framework will be explained through actual case studies.

Overall, this lecture will gain you some insight on how to help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing responsible ML systems

Textbook & References

Gradings

  • Homeworks and tests: 50%, Project 50%

Misc

  • Recommended to have basic knowledge on data structure and algorithms
  • Recommended to have basic knowledges on machine learning.
  • There will be some synergy by taking this course with project-based AI course and it will be beneficial by taking this course before joining any industry internship.

Schedule

Herebelow, you can find the tentative syllabus of the course.

Date Topic Materials Homework
02-28 Introduction; Understanding machine learning production
03-02 Understanding machine learning production (cont'd)
03-07 Data engineering and fundamentals
03-09 Data engineering and fundamentals
03-14 Training data and feature engineering
03-16 Training data and feature engineering
03-21 Model selection, development and training
03-23 Model selection, development and training
03-28 Offline model evaluation
03-30 Offline model evaluation
04-04 Deployment
04-06 Deployment
04-11 Integrating into business: Listening from industry experts
04-13 Integrating into business: Listening from industry experts
04-18 Midterm
04-20 No Lecture (Midterm Period)
04-25 Final project announcement
04-27 Final project announcement
05-02 Diagnosis of system failures
05-04 Diagnosis of system failures
05-09 Data distribution shifts and monitoring
05-11 Data distribution shifts and monitoring
05-16 Infrastructure and platform
05-18 Infrastructure and platform
05-23 Beyond accuracy: Fairness, security, governance
05-25 Beyond accuracy: Fairness, security, governance
05-30 Integrating into business: Listening from industry experts II
06-01 Integrating into business: Listening from industry experts II
06-06 No Lecture (National Holiday)
06-08 Project demo day
06-14 Final Exam
06-16 No Lecture (Final Exam Period)