Spaces:
Runtime error
Runtime error
im
commited on
Commit
·
30bd82e
1
Parent(s):
0a8544d
refine introdution
Browse files
app.py
CHANGED
@@ -24,39 +24,46 @@ def get_embeddings(text):
|
|
24 |
st.title("Transformers: Tokenisers and Embeddings")
|
25 |
|
26 |
preface_image, preface_text, = st.columns(2)
|
27 |
-
# preface_image.image("https://static.streamlit.io/examples/dice.jpg")
|
28 |
-
# preface_image.image("""https://assets.digitalocean.com/articles/alligator/boo.svg""")
|
29 |
preface_text.write("""\
|
30 |
-
|
31 |
-
immense interest. While numerous insightful tutorials are available, the evolution of transformer architectures over
|
32 |
-
the last few years has led to significant simplifications. These advancements have made it increasingly
|
33 |
-
straightforward to understand their inner workings. In this series of articles, I aim to provide a direct, clear explanation of
|
34 |
-
how and why modern transformers function, unburdened by the historical complexities associated with their inception.*
|
35 |
""")
|
36 |
|
37 |
divider()
|
38 |
|
39 |
st.write("""\
|
40 |
-
|
41 |
-
|
42 |
-
|
43 |
-
|
44 |
-
|
45 |
-
|
46 |
-
|
47 |
-
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
48 |
Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
|
49 |
Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
|
50 |
within the architecture are essential to the model's ability to process and generate language. In my view,
|
51 |
-
a comprehensive and simple explanation may give a reader a significant advantage in using LLMs.
|
52 |
-
|
53 |
-
|
54 |
-
|
55 |
-
|
56 |
-
|
57 |
-
|
58 |
-
Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
|
59 |
-
That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
|
60 |
""")
|
61 |
|
62 |
with st.expander("Copernicus Museum in Warsaw"):
|
@@ -72,10 +79,15 @@ with st.expander("Copernicus Museum in Warsaw"):
|
|
72 |
""")
|
73 |
st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
|
74 |
|
|
|
|
|
|
|
|
|
|
|
75 |
divider()
|
76 |
|
77 |
|
78 |
-
st.header("Tokenisers
|
79 |
|
80 |
st.write("""\
|
81 |
Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
|
@@ -713,7 +725,7 @@ with st.expander("References:"):
|
|
713 |
|
714 |
# *********************************************
|
715 |
divider()
|
716 |
-
st.header("Dimensionality Reduction
|
717 |
|
718 |
st.write("""\
|
719 |
As was mentioned above, embedding vectors are learned in such a way that words with similar meanings
|
|
|
24 |
st.title("Transformers: Tokenisers and Embeddings")
|
25 |
|
26 |
preface_image, preface_text, = st.columns(2)
|
|
|
|
|
27 |
preface_text.write("""\
|
28 |
+
"*I think I can safely say that nobody understands quantum mechanics.*" R. Feynman
|
|
|
|
|
|
|
|
|
29 |
""")
|
30 |
|
31 |
divider()
|
32 |
|
33 |
st.write("""\
|
34 |
+
Did you know that the leading AI models powering speech recognition, language translation,
|
35 |
+
and even your email auto-responses owe their capabilities to a single, revolutionary concept: the Transformer
|
36 |
+
architecture?
|
37 |
+
|
38 |
+
Artificial Intelligence (AI) has seen remarkable progress in the last decade, and a significant part of that is due
|
39 |
+
to advancements in Natural Language Processing (NLP). NLP, a subset of AI, involves the interaction between computers
|
40 |
+
and human language, making it possible for AI to understand, interpret, and generate human language in a valuable
|
41 |
+
way. Within this realm of NLP, a game-changer has emerged: the Transformer model. With its innovative architecture
|
42 |
+
and remarkable performance, the Transformer model has revolutionised how machines understand and generate human
|
43 |
+
language.
|
44 |
+
|
45 |
+
However, the complexity of Transformer models can be daunting, making them seem inaccessible to those without
|
46 |
+
extensive technical expertise. This creates a barrier to understanding, utilising, and improving upon these powerful
|
47 |
+
tools.
|
48 |
+
|
49 |
+
That's why I'm embarking on this series of articles, breaking down the key components of Transformer models into
|
50 |
+
digestible, easy-to-understand concepts. I have chosen to dedicate the first article in this series solely to
|
51 |
+
Tokenisers and Embeddings. The article has the following structure:
|
52 |
+
|
53 |
+
- [Tokenisers](#tokenisers)
|
54 |
+
- [Embeddings](#embeddings)
|
55 |
+
- [Vector Databases](#vector-databases)
|
56 |
+
- [Dimensionality Reduction](#dimensionality-reduction)
|
57 |
+
|
58 |
Understanding these foundational concepts is crucial to comprehending the overall structure and function of the
|
59 |
Transformer model. They are the building blocks from which the rest of the model is constructed, and their roles
|
60 |
within the architecture are essential to the model's ability to process and generate language. In my view,
|
61 |
+
a comprehensive and simple explanation may give a reader a significant advantage in using LLMs.
|
62 |
+
|
63 |
+
Are you ready to take a deep dive into the world of Transformers? I promise that by the end of this series,
|
64 |
+
you'll have a clearer understanding of how these complex models work and how they contribute to the remarkable
|
65 |
+
capabilities of modern AI.
|
66 |
+
|
|
|
|
|
|
|
67 |
""")
|
68 |
|
69 |
with st.expander("Copernicus Museum in Warsaw"):
|
|
|
79 |
""")
|
80 |
st.image("https://i.pinimg.com/originals/04/11/2c/04112c791a859d07a01001ac4f436e59.jpg")
|
81 |
|
82 |
+
st.write("""\
|
83 |
+
Note: *HuggingFace provides an exceptional [tutorial on Transformer models](https://huggingface.co/docs/transformers/index).
|
84 |
+
That tutorial is particularly beneficial for readers willing to dive into advanced topics.*
|
85 |
+
""")
|
86 |
+
|
87 |
divider()
|
88 |
|
89 |
|
90 |
+
st.header("Tokenisers")
|
91 |
|
92 |
st.write("""\
|
93 |
Tokenisation is the initial step in the data preprocessing pipeline for natural language processing (NLP)
|
|
|
725 |
|
726 |
# *********************************************
|
727 |
divider()
|
728 |
+
st.header("Dimensionality Reduction")
|
729 |
|
730 |
st.write("""\
|
731 |
As was mentioned above, embedding vectors are learned in such a way that words with similar meanings
|