1 4 8

Andreas Wagner PRO

awagner-mainz

https://orcid.org/0000-0003-1835-1653

AI & ML interests

Legal History, Legal Theory, Digital Humanities @anwagnerdreas@hcommons.social

Recent Activity

upvoted a collection about 2 months ago

Historic Language Modeling

liked a model 2 months ago

elenanereiss/German-legal-NER-flert

commented on a paper 3 months ago

Should We Still Pretrain Encoders with Masked Language Modeling?

View all activity

Organizations

upvoted a collection about 2 months ago

Historic Language Modeling

Collection

This collection contains models, datasets and spaces related to historic language models • 37 items • Updated Oct 31, 2023 • 4

liked a model 2 months ago

elenanereiss/German-legal-NER-flert

Token Classification • Updated Jul 21 • 3 • 2

commented a paper 3 months ago

Should We Still Pretrain Encoders with Masked Language Modeling?

Paper • 2507.00994 • Published Jul 1 • 78 •

liked a model 5 months ago

numind/NuExtract-2-8B-experimental

Text Generation • 8B • Updated Apr 15 • 44 • 25

liked a dataset 11 months ago

PleIAs/common_corpus

Viewer • Updated Jun 10 • 470M • 24.3k • 310

upvoted 2 papers 12 months ago

AutoTrain: No-code training for state-of-the-art models

Paper • 2410.15735 • Published Oct 21, 2024 • 59

PositionID: LLMs can Control Lengths, Copy and Paste with Explicit Positional Awareness

Paper • 2410.07035 • Published Oct 9, 2024 • 17

liked a Space 12 months ago

Argilla For Llamore

✍

liked a Space about 1 year ago

NuExtract

👀

liked a model about 1 year ago

numind/NuExtract

Text Generation • 4B • Updated Oct 17, 2024 • 980 • 227

upvoted an article about 1 year ago

Article

Fine-tune Llama 3.1 Ultra-Efficiently with Unsloth

•

Jul 29, 2024

• 361

reacted to m-ric's post with ❤️ about 1 year ago

Post

1737

𝗛𝗼𝘄 𝗱𝗼𝗲𝘀 𝗯𝗲𝗮𝗺 𝘀𝗲𝗮𝗿𝗰𝗵 𝗱𝗲𝗰𝗼𝗱𝗶𝗻𝗴 𝘄𝗼𝗿𝗸? ➡️ 𝙉𝙚𝙬 𝙫𝙞𝙨𝙪𝙖𝙡𝙞𝙯𝙖𝙩𝙞𝙤𝙣 𝙩𝙤𝙤𝙡! 👀

In Decoder-type LLMs like GPT4 or Mistral-Large, the output is generated one token (=word part) at a time. That's why they're nicknamed "stochastic parrots": the "thinking" process only happens one step at a time, so it can seem really myopic.

𝐒𝐨 𝐡𝐨𝐰 𝐢𝐬 𝐭𝐡𝐞 𝐧𝐞𝐱𝐭 𝐭𝐨𝐤𝐞𝐧 𝐬𝐞𝐥𝐞𝐜𝐭𝐞𝐝?

📊 Given its input sentence like "𝘞𝘩𝘢𝘵 𝘪𝘴 𝘵𝘩𝘦 7𝘵𝘩 𝘍𝘪𝘣𝘰𝘯𝘢𝘤𝘤𝘪 𝘯𝘶𝘮𝘣𝘦𝘳? 𝘛𝘩𝘦 7𝘵𝘩 𝘍𝘪𝘣𝘰𝘯𝘢𝘤𝘤𝘪 𝘯𝘶𝘮𝘣𝘦𝘳", the Decoder LLM generates, for each token in its vocabulary, a score that represents this token's probability of coming next.
For instance: "𝙞𝙨" gets score 0.56, and "𝙘𝙖𝙣" gets score 0.35.

🤑 𝐆𝐫𝐞𝐞𝐝𝐲 𝐝𝐞𝐜𝐨𝐝𝐢𝐧𝐠 is the naive option where you simply take the next most probable token at each step. But this creates paths that maximize very short-term rewards, thus may overlook better paths for the long term (like this time when you played FIFA all evening and arrived unprepared to your school exam on the next day).
In our example, the next highest score token might be "𝙞𝙨", but this will strongly bias the LLM towards giving an hasty response. On the opposite, starting with "𝙘𝙖𝙣" could have been completed with "𝘣𝘦 𝘰𝘣𝘵𝘢𝘪𝘯𝘦𝘥 𝘧𝘳𝘰𝘮 𝘤𝘰𝘮𝘱𝘶𝘵𝘪𝘯𝘨 𝘱𝘳𝘦𝘷𝘪𝘰𝘶𝘴 𝘍𝘪𝘣𝘰𝘯𝘢𝘤𝘤𝘪 𝘯𝘶𝘮𝘣𝘦𝘳𝘴 𝘧𝘪𝘳𝘴𝘵", which steers the LLM towards a correct reasoning!

🗺️ 𝐁𝐞𝐚𝐦 𝐬𝐞𝐚𝐫𝐜𝐡 improves on greedy decoding by generating at each step several paths - called beams - instead of one. This allows the generation to explore a much larger space, thus find better completions. In our example, both the "𝙞𝙨" and the "𝙘𝙖𝙣" completion could be tested. ✅

👉 I've created a tool to let you visualize it, thank you @joaogante for your great help!
𝙏𝙧𝙮 𝙞𝙩 𝙝𝙚𝙧𝙚: m-ric/beam_search_visualizer

liked a model over 1 year ago

Sahajtomar/NER_legal_de

Token Classification • Updated May 18, 2021 • 6 • 3

liked a model about 3 years ago

bigscience/bloom

Text Generation • 176B • Updated Jul 28, 2023 • 3.3k • 4.94k