Sébastien De Greef commited on
Commit
255d69d
1 Parent(s): db1f0f8

chore: Add online tokenizer playground link to tokenizers.qmd

Browse files
Files changed (1) hide show
  1. src/llms/tokenizers.qmd +1 -0
src/llms/tokenizers.qmd CHANGED
@@ -5,6 +5,7 @@ title: Tokenizers
5
  Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
6
 
7
  [Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
 
8
 
9
  ## What is Tokenization?
10
 
 
5
  Tokenization is a fundamental step in natural language processing (NLP) that involves breaking down text into smaller components, such as words, phrases, or symbols. These smaller components are called tokens. Tokenizers, the tools that perform tokenization, play a crucial role in preparing text for various NLP tasks like machine translation, sentiment analysis, and text summarization. This article provides an exhaustive overview of tokenizers, exploring their types, how they function, their importance, and the challenges they present.
6
 
7
  [Excellent video of Andrej Karpathy about Tokenizers](https://www.youtube.com/watch?v=zduSFxRajkE)
8
+ [Online Tokenizer Playground](https://gpt-tokenizer.dev/)
9
 
10
  ## What is Tokenization?
11