arxiv:2412.00869

Exploring the Abilities of Large Language Models to Solve Proportional Analogies via Knowledge-Enhanced Prompting

Published on Dec 1, 2024

· Submitted by

amanchadha on Dec 3, 2024

Upvote

Authors:

Aman Chadha ,

Ponnurangam Kumaraguru ,

Abstract

Making analogies is fundamental to cognition. Proportional analogies, which consist of four terms, are often used to assess linguistic and cognitive abilities. For instance, completing analogies like "Oxygen is to Gas as <blank> is to <blank>" requires identifying the semantic relationship (e.g., "type of") between the first pair of terms ("Oxygen" and "Gas") and finding a second pair that shares the same relationship (e.g., "Aluminum" and "Metal"). In this work, we introduce a 15K Multiple-Choice Question Answering (MCQA) dataset for proportional analogy completion and evaluate the performance of contemporary Large Language Models (LLMs) in various knowledge-enhanced prompt settings. Specifically, we augment prompts with three types of knowledge: exemplar, structured, and targeted. Our results show that despite extensive training data, solving proportional analogies remains challenging for current LLMs, with the best model achieving an accuracy of 55%. Notably, we find that providing targeted knowledge can better assist models in completing proportional analogies compared to providing exemplars or collections of structured knowledge.

View arXiv page View PDF Add to collection

Community

amanchadha

Paper author Paper submitter Dec 3, 2024

We introduce a 15,000-item proportional analogy dataset to evaluate the performance of nine large language models (LLMs) and investigates the impact of various knowledge-enhanced prompting techniques, revealing that targeted knowledge improves accuracy significantly.

Dataset Contribution: A novel 15,000 multiple-choice question dataset covering 238 relation types for proportional analogies is presented, significantly expanding the scope of previous datasets.
Evaluation: Nine LLMs were tested using different prompting methods, including zero-shot, few-shot, structured knowledge prompting, and targeted knowledge prompting, with GPT-3.5-Turbo achieving the highest accuracy (55.25%) using targeted knowledge.
Key Findings: Targeted knowledge prompting improves LLM performance more effectively than exemplar or structured knowledge, highlighting challenges in leveraging structured knowledge and the value of task-specific knowledge integration.

librarian-bot

Dec 4, 2024

This is an automated message from the Librarian Bot. I found the following papers similar to this paper.

The following papers were recommended by the Semantic Scholar API

Please give a thumbs up to this comment if you found it helpful!

If you want recommendations for any Paper on Hugging Face checkout this Space

You can directly ask Librarian Bot for paper recommendations by tagging it in a comment: @librarian-bot recommend

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment

Upvote

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2412.00869 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2412.00869 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2412.00869 in a Space README.md to link it from this page.