Papers
arxiv:2412.00869

Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting

Published on Dec 1
· Submitted by amanchadha on Dec 3
Authors:
,
,
,
,
,
,

Abstract

Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge.

Community

Paper author Paper submitter

We introduce a 15,000-item proportional analogy dataset to evaluate the performance of nine large language models (LLMs) and investigates the impact of various knowledge-enhanced prompting techniques, revealing that targeted knowledge improves accuracy significantly.

  • Dataset Contribution: A novel 15,000 multiple-choice question dataset covering 238 relation types for proportional analogies is presented, significantly expanding the scope of previous datasets.
  • Evaluation: Nine LLMs were tested using different prompting methods, including zero-shot, few-shot, structured knowledge prompting, and targeted knowledge prompting, with GPT-3.5-Turbo achieving the highest accuracy (55.25%) using targeted knowledge.
  • Key Findings: Targeted knowledge prompting improves LLM performance more effectively than exemplar or structured knowledge, highlighting challenges in leveraging structured knowledge and the value of task-specific knowledge integration.

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.00869 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.00869 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.00869 in a Space README.md to link it from this page.

Collections including this paper 1