Update README.md
Browse files
README.md
CHANGED
@@ -18,12 +18,10 @@ Embeddings is the engine that delivers semantic search. Data is transformed into
|
|
18 |
An embeddings index generated by txtai is a fully encapsulated index format. It DOESN'T require a database server.
|
19 |
|
20 |
This index is built from the [Wikipedia Februari 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-sv-20240220).
|
21 |
-
Only the first two paragraph from each article is included. The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG).
|
22 |
-
|
23 |
-
It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
|
24 |
to only match commonly visited pages.
|
25 |
|
26 |
-
txtai must be [installed](https://neuml.github.io/txtai/install/) to use this model.
|
27 |
|
28 |
## Example
|
29 |
|
@@ -44,7 +42,7 @@ for x in embeddings.search("SELECT id, text, score, percentile FROM txtai WHERE
|
|
44 |
print(json.dumps(x, indent=2))
|
45 |
```
|
46 |
|
47 |
-
#
|
48 |
|
49 |
https://dumps.wikimedia.org/svwiki/20240220/dumpstatus.json
|
50 |
|
|
|
18 |
An embeddings index generated by txtai is a fully encapsulated index format. It DOESN'T require a database server.
|
19 |
|
20 |
This index is built from the [Wikipedia Februari 2024 dataset](https://huggingface.co/datasets/burgerbee/wikipedia-sv-20240220).
|
21 |
+
Only the first two paragraph from each article is included. The Wikipedia index works well as a fact-based context source for retrieval augmented generation (RAG). It also uses [Wikipedia Page Views](https://dumps.wikimedia.org/other/pageviews/readme.html) data to add a `percentile` field. The `percentile` field can be used
|
|
|
|
|
22 |
to only match commonly visited pages.
|
23 |
|
24 |
+
txtai must be (pip) [installed](https://neuml.github.io/txtai/install/) to use this model.
|
25 |
|
26 |
## Example
|
27 |
|
|
|
42 |
print(json.dumps(x, indent=2))
|
43 |
```
|
44 |
|
45 |
+
# Data source
|
46 |
|
47 |
https://dumps.wikimedia.org/svwiki/20240220/dumpstatus.json
|
48 |
|