arabic-RAG / README.md
derek-thomas's picture
derek-thomas HF staff
Added preprocessing code
6404d3b
|
raw
history blame
504 Bytes
metadata
title: Arabic Wiki
emoji: πŸ“ˆ
colorFrom: purple
colorTo: purple
sdk: gradio
sdk_version: 3.44.4
app_file: app.py
pinned: false
license: apache-2.0

Check out the configuration reference at https://huggingface.co/docs/hub/spaces-config-reference

Installation

pip install requirements.txt

Pre-processing

wget https://dumps.wikimedia.org/arwiki/latest/arwiki-latest-pages-articles-multistream.xml.bz2
wikiextractor -o output --json arwiki-latest-pages-articles-multistream.xml.bz2