Class Logistics

Back to the Data Engineering Course - 2023 Spring (AI5308/AI4005)

Course Overview

Machine learning systems are both complex and unique. It is complex because they consist of many different components and involve many different stakeholders, and it is unique because they’re data dependent, with data varying wildly from one use case to the next. In this lecture, you’ll learn how to conduct data engineering, and hoslistic approach to designing ML systems that are reliable, scalable, maintainable, and adaptive to changing environments and business requirements.

During the lecture, we will consider each design decision, such as how to process and create training data, which features to use, how often to retrain models, and what to monitor in the context of how it can help your system as a whole achieve its objectives. The iterative framework will be explained through actual case studies.

Overall, this lecture will gain you some insight on how to help you tackle scenarios such as:

  • Engineering data and choosing the right metrics to solve a business problem
  • Automating the process for continually developing, evaluating, deploying, and updating models
  • Developing responsible ML systems


There are no official course prerequisites. But the final project will require building machine learning applications, so it is recommended to have basic knowledges on machine learning and some fluency in programming is needed. This is not a course of learning fancy algorithms, but you’ll get to know how to apply your machine learning knowledge in systemical way. Web programming skills are a plus, but not required. There will be some synergy by taking this course with project-based AI course such as AI4028. Also, it will be beneficial by taking this course before joining any industry internship.



  • How difficult is the course? The materials are not difficult to understand, but the final projects are fairly involved. We wouldn’t recommend taking the course unless you’re ready to build things and learn from hands-on experience!

  • Is attendance mandatory? We won’t be taking attendance but we expect to see you often in class. We love talking to students to understand how you are doing, make sure you get the most out of the class, and get your feedback to improve the materials. We will have one or two pop-quizzes though.

  • What is the format of the class? It will be lectures, tutorials, and discussion. We will often have industry experts to give us tutorials.

  • I don’t have a team for the final project, can I still enroll? Yes. Most students don’t have a team already when they join the course. We’ll have activities for you to find project partners.

  • Can I work in groups for the assignments? Yes, in groups of up to four people.

  • How mature is the course? This is the first time the course is offered. Most of the materials are from Chip Huyen’s CS329S course. For me to handle this lecture, there is still a long way to go. We’re trying our best to ensure the quality of the lectures, but it might not be as polished as other courses. Your feedback will be greatly appreciated.

  • Do I need to know Python for the course? Since Python has become the most popular language for machine learning, we expect most tutorials will be in Python. Python fluency isn’t required, but will make your life so much easier during the course.

  • I have a question about the class. What is the best way to reach the course staff? Please post your question on the course forum so that other students can benefit from your questions. If you have a personal matter or emergencies, please send the discord message to me and our TAs at Discord.