Spaces:
Running
Running
Sébastien De Greef
commited on
Commit
•
255d69d
1
Parent(s):
db1f0f8
chore: Add online tokenizer playground link to tokenizers.qmd
Browse files- src/llms/tokenizers.qmd +1 -0
src/llms/tokenizers.qmd
CHANGED
@@ -5,6 +5,7 @@ title: Tokenizers
|
|
5 |
Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
|
6 |
|
7 |
[Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
|
|
|
8 |
|
9 |
## What is Tokenization?
|
10 |
|
|
|
5 |
Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
|
6 |
|
7 |
[Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
|
8 |
+
[Online Tokenizer Playground](https://gpt-tokenizer.dev/)
|
9 |
|
10 |
## What is Tokenization?
|
11 |
|