Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting
Abstract
Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge.
Community
We introduce a 15,000-item proportional analogy dataset to evaluate the performance of nine large language models (LLMs) and investigates the impact of various knowledge-enhanced prompting techniques, revealing that targeted knowledge improves accuracy significantly.
- Dataset Contribution: A novel 15,000 multiple-choice question dataset covering 238 relation types for proportional analogies is presented, significantly expanding the scope of previous datasets.
- Evaluation: Nine LLMs were tested using different prompting methods, including zero-shot, few-shot, structured knowledge prompting, and targeted knowledge prompting, with GPT-3.5-Turbo achieving the highest accuracy (55.25%) using targeted knowledge.
- Key Findings: Targeted knowledge prompting improves LLM performance more effectively than exemplar or structured knowledge, highlighting challenges in leveraging structured knowledge and the value of task-specific knowledge integration.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Learning by Analogy: Enhancing Few-Shot Prompting for Math Word Problem Solving with Computational Graph-Based Retrieval (2024)
- Teaching-Inspired Integrated Prompting Framework: A Novel Approach for Enhancing Reasoning in Large Language Models (2024)
- ZEBRA: Zero-Shot Example-Based Retrieval Augmentation for Commonsense Question Answering (2024)
- Disentangling Memory and Reasoning Ability in Large Language Models (2024)
- Concept-Reversed Winograd Schema Challenge: Evaluating and Improving Robust Reasoning in Large Language Models via Abstraction (2024)
- LINKED: Eliciting, Filtering and Integrating Knowledge in Large Language Model for Commonsense Reasoning (2024)
- SimRAG: Self-Improving Retrieval-Augmented Generation for Adapting Large Language Models to Specialized Domains (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper