DPO Kernels: A Semantically-Aware, Kernel-Enhanced, and Divergence-Rich Paradigm for Direct Preference Optimization
Abstract
The rapid rise of large language models (LLMs) has unlocked many applications but also underscores the challenge of aligning them with diverse values and preferences. Direct Preference Optimization (DPO) is central to alignment but constrained by fixed divergences and limited feature transformations. We propose DPO-Kernels, which integrates kernel methods to address these issues through four key contributions: (i) Kernelized Representations with polynomial, RBF, Mahalanobis, and spectral kernels for richer transformations, plus a hybrid loss combining embedding-based and probability-based objectives; (ii) Divergence Alternatives (Jensen-Shannon, Hellinger, Renyi, Bhattacharyya, Wasserstein, and f-divergences) for greater stability; (iii) Data-Driven Selection metrics that automatically choose the best kernel-divergence pair; and (iv) a Hierarchical Mixture of Kernels for both local precision and global modeling. Evaluations on 12 datasets demonstrate state-of-the-art performance in factuality, safety, reasoning, and instruction following. Grounded in Heavy-Tailed Self-Regularization, DPO-Kernels maintains robust generalization for LLMs, offering a comprehensive resource for further alignment research.
Community
- The paper introduces DPO-Kernels, an enhanced framework for Direct Preference Optimization, integrating kernelized representations and alternative divergence measures to achieve robust and scalable alignment across tasks.
- Kernelized Representations and Hybrid Loss: Introduces polynomial, RBF, spectral, and Mahalanobis kernels combined with a hybrid loss for richer feature transformations and improved model alignment.
- Divergence and Selection Innovations: Incorporates diverse divergence measures (e.g., Jensen-Shannon, Wasserstein) and proposes data-driven metrics to select optimal kernel-divergence pairs dynamically.
- Hierarchical Kernel Mixture and Evaluation: Proposes a Hierarchical Mixture of Kernels (HMK) for balanced modeling of local and global dependencies, achieving state-of-the-art results on 12 diverse datasets.
This is an automated message from the Librarian Bot. I found the following papers similar to this paper.
The following papers were recommended by the Semantic Scholar API
- Direct Preference Optimization Using Sparse Feature-Level Constraints (2024)
- Pre-train, Align, and Disentangle: Empowering Sequential Recommendation with Large Language Models (2024)
- Break the ID-Language Barrier: An Adaption Framework for Sequential Recommendation (2024)
- ULMRec: User-centric Large Language Model for Sequential Recommendation (2024)
- Bridging Relevance and Reasoning: Rationale Distillation in Retrieval-Augmented Generation (2024)
- Graph-Sequential Alignment and Uniformity: Toward Enhanced Recommendation Systems (2024)
- Molar: Multimodal LLMs with Collaborative Filtering Alignment for Enhanced Sequential Recommendation (2024)
Please give a thumbs up to this comment if you found it helpful!
If you want recommendations for any Paper on Hugging Face checkout this Space
You can directly ask Librarian Bot for paper recommendations by tagging it in a comment:
@librarian-bot
recommend
Models citing this paper 0
No model linking this paper
Datasets citing this paper 0
No dataset linking this paper
Spaces citing this paper 0
No Space linking this paper