Edit Models filters

Inference Providers

Nebius AI Studio

HF Inference API

Misc

Inference Endpoints

AutoTrain Compatible

text-generation-inference

4-bit precision

8-bit precision

Misc with no match

text-embeddings-inference

Carbon Emissions

Mixture of Experts

Models

324

Full-text search

Active filters: rlhf

PKU-Alignment/beaver-7b-v2.0-cost

Reinforcement Learning • Updated Apr 20, 2024 • 19

PKU-Alignment/beaver-7b-v3.0

Reinforcement Learning • Updated May 9, 2024 • 103

PKU-Alignment/beaver-7b-v3.0-reward

Reinforcement Learning • Updated Apr 20, 2024 • 14

PKU-Alignment/beaver-7b-v3.0-cost

Reinforcement Learning • Updated Apr 20, 2024 • 22

PKU-Alignment/beaver-7b-unified-reward

Reinforcement Learning • Updated Apr 20, 2024 • 3.06k

PKU-Alignment/beaver-7b-unified-cost

Reinforcement Learning • Updated Apr 20, 2024 • 3.14k • 1

Aditya685/UpshotLlama-3-8B

Text Generation • Updated Apr 20, 2024 • 10

bartowski/OrpoLlama-3-8B-GGUF

Text Generation • Updated Apr 20, 2024 • 443 • 4

QuantFactory/NeuralDaredevil-7B-GGUF

Text Generation • Updated May 24, 2024 • 220

LoneStriker/OrpoLlama-3-8B-GGUF

Updated Apr 21, 2024 • 13 • 1

LoneStriker/OrpoLlama-3-8B-3.0bpw-h6-exl2

Text Generation • Updated Apr 21, 2024 • 6

LoneStriker/OrpoLlama-3-8B-4.0bpw-h6-exl2

Text Generation • Updated Apr 21, 2024 • 7

LoneStriker/OrpoLlama-3-8B-5.0bpw-h6-exl2

Text Generation • Updated Apr 21, 2024 • 7

LoneStriker/OrpoLlama-3-8B-6.0bpw-h6-exl2

Text Generation • Updated Apr 21, 2024 • 5

LoneStriker/OrpoLlama-3-8B-8.0bpw-h8-exl2

Text Generation • Updated Apr 21, 2024 • 5

jalaganapathy/jalaModelRepo

Text Generation • Updated Apr 21, 2024 • 4

mlx-community/OrpoLlama-3-8B-4bit

Text Generation • Updated Apr 21, 2024 • 5

mlx-community/OrpoLlama-3-8B-8bit

Text Generation • Updated Apr 21, 2024 • 6

bartowski/OrpoLlama-3-8B-exl2

Text Generation • Updated Apr 21, 2024 • 6 • 1

hus960/OrpoLlama-3-8B-Q4_K_M-GGUF

Updated Apr 23, 2024 • 15

DavidAU/AlphaMonarch-7B-Q6_K-GGUF

Updated Apr 24, 2024 • 4

QuantFactory/OrpoLlama-3-8B-GGUF

Text Generation • Updated Apr 24, 2024 • 301

dfurman/Llama-3-8B-Orpo-v0.1

Text Generation • Updated Sep 17, 2024 • 2.03k • 1

dfurman/Llama-3-70B-Orpo-v0.1

Text Generation • Updated Sep 6, 2024 • 80 • 2

newsletter/CapybaraHermes-2.5-Mistral-7B-Q6_K-GGUF

Updated Aug 17, 2024 • 6 • 1

mradermacher/archangel_sft-kto_llama30b-GGUF

Updated May 31, 2024 • 275 • 1

mradermacher/archangel_sft-kto_llama30b-i1-GGUF

Updated Aug 2, 2024 • 524

line-corporation/sacpo

Reinforcement Learning • Updated Jun 21, 2024 • 16 • 5

nvidia/Llama3-70B-PPO-Chat

Updated Jun 14, 2024 • 6

line-corporation/p-sacpo

Reinforcement Learning • Updated Jun 21, 2024 • 14 • 3