Paper Critiques 📝 | Sundong Kim

Back to the Data Engineering Course - 2023 Spring (AI5308/AI4005)

Paper critique is an academic writing that summarizes and gives a critical evaluation of a concept or work. Or, to put it simply, it is no more than a summary and a critical analysis of a specific issue. This type of writing aims to evaluate the impact of the given work or concept in its field.

Three critiques

In each class you’ll read textbook chapters or papers, and sometimes we will ask you to write a short critique for those articles. Critique writing helps you to practice writing. At the same time, it provides opportunities to criticize the (well-written) papers in the field. Of course, no papers are perfect, and there are many rooms for improvement!

Here are the topics for you to write a critique about. Each topic comes with relevant papers.

Critique 1 (Topic: Using machine learning in a right way)
- Paper: 150 Successful Machine Learning Models: 6 Lessons Learned at Booking.com
- Due: Mar 27, 23:59 (KST)
Critique 2 (Topic: Applying state-of-the-art models in a target domain)
- Paper: e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce
- See two videos with the paper: Wonyoung’s talk, Byeongjo and Shengzhe’s talk
- Due: Apr 24, 23:59 (KST)
Critique 3 (Topic: Responsible AI - lessons learned on developing DALL-E and GPT-3 at Open AI)
- Article: DALL·E 2 Preview - Risks and Limitations
- Article: Lessons learned on language model safety and misuse
- Due: May 22, 23:59 (KST)

You need to submit the critiques through this form. No late submissions are allowed. After the deadline, you are able to see other people’s critiques and there will be a in-class discussion regarding to your critiques.

Gradings

Critiques take up 15% of your grades. There will be three critiques and each critique will take about 5% of your grade. Critiques will be evaluated by staffs. You will receive one of the following grades:

Check Plus (5 points) - The critique is very well written. Strength/weakness items are very insightful.
Check (3 points) - It looks okay. It’s likely that most critiques will belong to this class
Check Minus (1 points) - The critique is weak. For example, a summary is very much vague. Strength/weakness items were trivials and are not insightful at all. Trivial questions were asked, and discussion was very shallow.
No submission / late submission (0 points)

Critique Writing Guidelines

Here is a overleaf template to kick-start your writing, copy this project and use for your words.

Paper Name - write down the name of the paper
Rating: X out of 5 - rate the current paper (we’ll exclude lowly rated papers in the future)

Provide a summary in your own words (one paragraph)

Provide a brief description of the results based on your perspectives
Do not simply repeat the words in the paper, but try to explain things on your words in a less formal way
Try to summarize the main concept and key contributions
Add one line comment about whether the contributions are significant and whether the work is effective at delivering the key messages

Strength & Weakness

You then need to identify several items of strengths and weaknesses. One suggestion is to find three strengths (positive aspects) and one weakness. As a junior researcher, it’s difficult to find weaknesses!
Strengths could be related to 1) storytelling of problem statements, 2) well summarized related work, 3) great ideas of doing user studies, 4) new methods of data analysis, 5) inspiring discussion, 6) promising research directions
Weakness could be related to 1) less clear contribution to the community, 2) flaws in study design, 3) lack of generalizability, 4) flaws in evaluation, 5) missing important steps, 6) missing important aspects (in different phases), 7) lack of justifications, 8) weak ecological validity, etc.
There are many parts of the paper that you can critique. Recall that no paper is perfect; it’s easy to find weaknesses. Likewise, you can easily find strengths in the paper. Several examples include:
- (strength) “I really like the experiment setup. The authors deployed the system in the wild and recruited around 1000 users! This large scale evaluation clearly shows the ecological validity of the proposed system.”
- (weakness) “I wish the authors compared the system with other approaches. Currently, the authors showed relative improvements over the naive solutions.”
- (weakness) “In Section 3, the authors proposed to use the approach X, but that was not well justified. The authors could have used the approach Y instead. I wondered how the results would be if they used the approach Y.”
- (strength) “The current discussion is very much inspiring. Initially, I thought that the authors talked about a very narrow problem of solving a specific problem. In the discussion, however, the authors explored diverse opportunities about how the current findings can be applied! After reading the discussion, I was convinced that this research area is very important and there should be further studies in this direction.

Questions:

Parts that you have a hard time understanding
Something that you want to know more
Something that is not clear in the paper

Discussion

After reading the paper, are there any changes in your opinion or perspectives on this topic? If so, state why (previously vs. now)
Are there any possibilities of using the knowledge learned from this paper in your research project? (For example, “The current evaluation framework is very much related to my project. I’m thinking of adopting the following measures: speed of input and…”)
Did future research directions or discussion inspire you? If so, state what are they?
Are there any things that you want to discuss in the class? (For example, “ the authors raised ethical concerns on the approach Y, but I’m not so convinced of that. I would like to hear from others about this aspect.”)

Example Critique:

This is one example from my class at 2023 Spring. It’s a bit longer than I expected, but this example well delivered the key aspects of the critique and it received a “check plus”.

Title: e-CLIP: Large-Scale Vision-Language Representation Learning in E-commerce

Rating: 4/5

Summary: This paper proposes a contrastive learning framework that effectively aligns language and visual models using unlabeled raw product text and images in the e-commerce domain. Inspired by prior research such as CLIP and ALIGN, the authors evaluate the performance of pre-trained models as backbones for diverse downstream tasks within the e-commerce sector. Their proposed method adeptly overcomes the inherent issues of data quality and scale in e-commerce, demonstrating the applicability of large-scale multi-modal pre-training approaches in this domain. They convincingly validate the efficacy of their e-CLIP framework through both offline experiments and, notably, online experiments as well.

Strength: (1) The research is particularly commendable for its thorough evaluation of the e-CLIP model in both online and offline environments, demonstrating its practical applicability in the e-commerce domain. Using the NAVER Shopping dataset, the authors conducted extensive experiments, comparing e-CLIP against state-of-the-art models in various tasks and integrating it with downstream applications. These efforts led to improvements across multiple applications and enhanced training efficiency, highlighting the model’s practical value and effectiveness. (2) The authors significantly contributed by analyzing e-CLIP’s components to gain a deeper understanding of its inner workings and improve performance. They systematically removed or modified various components and evaluated their impact. This led to identifying critical components for achieving high performance and gaining valuable insights. Their thorough research, including the discovery of the benefits of using a multi-modal transformer encoder with contrastive loss and larger batch sizes, serves as a major strength of the paper.

Weakness: (1) This study does not provide a detailed analysis of the computing resources required for training and deploying e-CLIP. While the authors mention that they have optimized the training speed and memory usage of computational resources, they do not offer an in-depth analysis of these optimizations. This point serves as a weakness of the paper, leaving room for a more comprehensive exploration of the resource requirements and optimization strategies. (2) This study does not provide a comprehensive analysis of e-CLIP’s interpretability. Although the authors discuss some aspects of model interpretability, such as the analysis of model components, they do not offer a detailed examination of how e-CLIP performs predictions or how its representations can be interpreted. This point serves as a weakness of the paper, suggesting that further research could explore the interpretability aspect of the e-CLIP model more thoroughly. (3) While the authors demonstrate the effectiveness of e-CLIP in real-world applications within an industrial setting, they do not provide a detailed analysis of the ethical implications of their approach. For instance, they do not discuss potential biases in the data or model, nor do they address the impact their approach might have on user privacy protection. This point serves as a weakness of the paper, indicating a need for further exploration of the ethical considerations surrounding e-CLIP’s deployment.

Questions: (1) I am curious to learn more about Figure 2, which illustrates the system architecture of NAVER Shopping. While some of the technologies employed, such as Hadoop and Spark, are mentioned, I am interested in learning more about the technologies used in this sequence and the underlying mechanisms of the system. (2) The performance metrics used for their models, such as Top-1 accuracy, Top-5 accuracy, NMI, ARI, and F1 score, are well-established. However, I am curious to know the business impact resulting from the model performance improvement and which metrics hold particular importance from a business perspective. This point serves as a query for the paper, seeking to understand the practical implications of the model’s performance in a commercial context.

Discussion: Upon reading this paper, it is evident that the introduction of e-CLIP has led to performance improvements across various metrics. However, from a business standpoint, it is not always feasible to deploy a model with even slight performance improvements in a production environment immediately. This is because adopting new technology and modifying models both incur costs. So, what would be an acceptable threshold for performance improvement to warrant the adoption of a new technology? From what point can we say that the benefits of changing the technology outweigh the costs? Which perspective should we consider to better assess this matter? This point serves as a discussion topic for the paper, contemplating the criteria for adopting new technologies in a practical context.