Homework 1 ✍🏻
Back to the Data Engineering Course - 2024 Spring (AI5308/AI4005)
The first homework consists of a paper critique and a design problem with a programming component. It is due on Sunday, Mar 24, at 11:59 PM.
- Deliverables:
- Submit a PDF of your homework, listing all your code, to the Gradescope entitled “HW1”. You may typeset your homework in LaTeX or Word or submit neatly handwritten and scanned solutions. Please start each question on a new page. Include any graphs within the relevant sections. Each solution should be self-contained on its own page.
- Please list the names of students who helped you or those whom you helped with the homework. Note that exchanging code is not allowed.
- To facilitate the grading process, you must match which page corresponds to each of the questions, when you are submitting your homework. See this video for Gradescope tutorial.
- Contents:
- Paper Critique:
- Read the Paper: 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com and write a critique following these guidelines.
- (Optional) Watch this relevant video by Lucas Bernardi to help your understanding.
- Design Problem & Programming:
- Topic: Design an ML system to predict the top 20 home run hitters in the MLB 2024 season. The system should predict results similar to the table below (from MLB 2022). For crawling and data preprocessing, refer to this tutorial. Note that the 2024 MLB season is ongoing and will last until Sep 29, 2024, which means you should devise a better prediction scheme.
- It is strongly recommended to use Google Colab. Include your thoughts alongside your code, using Markdown for formatting. You will submit both the shareable link and a PDF of your notebook after running all your code. (See how to export a Colab notebook to PDF)
- Paper Critique:
- Grading Criteria:
- Paper Critique (5%):
- Check Plus (5%) - The critique is very well written and very insightful.
- Check (3-4%) - Adequate. Most critiques are expected to fall into this category.
- Check Minus (2%) - The critique lacks depth. Summaries may be vague, strengths/weaknesses trivial, questions superficial, and discussions shallow.
- No submission (0%)
- Design Problem & Programming (10%):
- Check Plus (10%) - The solution is exceptionally well-developed and coded.
- Check (6-8%) - Adequate. Most submissions are expected to fall into this category.
- Check Minus (2-4%) - The solution lacks depth.
- No submission (0%)
- Late submission will be graded according to the late policy
- Paper Critique (5%):