Models
Datasets
Spaces
Posts
Docs
Pricing
Log In
Sign Up

Collections

Discover the best community collections!

Collections including paper arxiv:2104.09864

Finished Reading

Self-Play Preference Optimization for Language Model Alignment

Paper • 2405.00675 • Published May 1 • 22
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
FlashAttention-2: Faster Attention with Better Parallelism and Work Partitioning

Paper • 2307.08691 • Published Jul 17, 2023 • 7

LLM Fundamental papers

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11
GQA: Training Generalized Multi-Query Transformer Models from Multi-Head Checkpoints

Paper • 2305.13245 • Published May 22, 2023 • 5
Llama 2: Open Foundation and Fine-Tuned Chat Models

Paper • 2307.09288 • Published Jul 18, 2023 • 239

Language model papers

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
LoRA: Low-Rank Adaptation of Large Language Models

Paper • 2106.09685 • Published Jun 17, 2021 • 29
FlashAttention: Fast and Memory-Efficient Exact Attention with IO-Awareness

Paper • 2205.14135 • Published May 27, 2022 • 9

Papers-Fundamentals

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 9
Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
Direct Nash Optimization: Teaching Language Models to Self-Improve with General Preferences

Paper • 2404.03715 • Published Apr 4 • 59
Zero-Shot Tokenizer Transfer

Paper • 2405.07883 • Published May 13 • 4

positional encoding Language models

RoFormer: Enhanced Transformer with Rotary Position Embedding

Paper • 2104.09864 • Published Apr 20, 2021 • 9
Self-Attention with Relative Position Representations

Paper • 1803.02155 • Published Mar 6, 2018

生成式AI導論 2024

https://www.youtube.com/@HungyiLeeNTU

Re3: Generating Longer Stories With Recursive Reprompting and Revision

Paper • 2210.06774 • Published Oct 13, 2022 • 2
Constitutional AI: Harmlessness from AI Feedback

Paper • 2212.08073 • Published Dec 15, 2022 • 2
AnyTool: Self-Reflective, Hierarchical Agents for Large-Scale API Calls

Paper • 2402.04253 • Published Feb 6
Encouraging Divergent Thinking in Large Language Models through Multi-Agent Debate

Paper • 2305.19118 • Published May 30, 2023

Foundation AI Papers

Curated List of Must-Reads on LLM reasoning at Temus AI team

Language Agent Tree Search Unifies Reasoning Acting and Planning in Language Models

Paper • 2310.04406 • Published Oct 6, 2023 • 8
Chain-of-Thought Reasoning Without Prompting

Paper • 2402.10200 • Published Feb 15 • 94
ICDPO: Effectively Borrowing Alignment Capability of Others via In-context Direct Preference Optimization

Paper • 2402.09320 • Published Feb 14 • 6
Self-Discover: Large Language Models Self-Compose Reasoning Structures

Paper • 2402.03620 • Published Feb 6 • 109

Large Language Model (LLM) and NLP related papers.

about 15 hours ago

LoRA+: Efficient Low Rank Adaptation of Large Models

Paper • 2402.12354 • Published Feb 19 • 6
The FinBen: An Holistic Financial Benchmark for Large Language Models

Paper • 2402.12659 • Published Feb 20 • 16
TofuEval: Evaluating Hallucinations of LLMs on Topic-Focused Dialogue Summarization

Paper • 2402.13249 • Published Feb 20 • 10
TrustLLM: Trustworthiness in Large Language Models

Paper • 2401.05561 • Published Jan 10 • 63

Transformer Arch

Checkout: https://bbycroft.net/llm and http://nlp.seas.harvard.edu/2018/04/03/attention.html

Attention Is All You Need

Paper • 1706.03762 • Published Jun 12, 2017 • 41
ImageNet Large Scale Visual Recognition Challenge

Paper • 1409.0575 • Published Sep 1, 2014 • 8
Sequence to Sequence Learning with Neural Networks

Paper • 1409.3215 • Published Sep 10, 2014 • 3
Language Models are Few-Shot Learners

Paper • 2005.14165 • Published May 28, 2020 • 11

Embedding Papers

Improving Text Embeddings with Large Language Models

Paper • 2401.00368 • Published Dec 31, 2023 • 79
BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding

Paper • 1810.04805 • Published Oct 11, 2018 • 14
Metadata Might Make Language Models Better

Paper • 2211.10086 • Published Nov 18, 2022 • 4
DecoderLens: Layerwise Interpretation of Encoder-Decoder Transformers

Paper • 2310.03686 • Published Oct 5, 2023 • 3

Previous
1
2
Next

Company

© Hugging Face

TOS Privacy About Jobs

Website

Models Datasets Spaces Pricing Docs