Polish Question Answering
Collection of models and datasets for Polish Question Answering.
Sentence Similarity • Updated • 7.72k • 10Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/silver-retriever-base-v1
Sentence Similarity • Updated • 104 • 10Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/polqa
Updated • 309 • 8Note PolQA is the first Polish dataset for open-domain question answering. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7 million candidate passages. The dataset can be used to train both a passage retriever and an abstractive reader.
ipipan/maupqa
Updated • 203 • 4Note MAUPQA is a collection of 14 datasets for Polish document retrieval. Most of the datasets are either machine-generated or machine-translated from English. Across all datasets, it consists of over 1M questions, 1M positive, and 7M hard-negative question-passage pairs.
clarin-pl/poquad
Viewer • Updated • 52k • 182 • 4Note PoQuAD is a Polish equivalent of the SQuAD. It consists of more than 70,000 question-passage pairs, as well as extractive and abstractive answers.
allegro/polish-question-passage-pairs
Viewer • Updated • 10.4k • 83 • 4Note Over 10,000 manually annotated question-passage pairs. While the questions are taken from the PolQA dataset, the passages are often unique. In particular, the dataset consists mostly of hard negatives (8k pairs).
allegro/klej-dyk
Viewer • Updated • 5.18k • 312 • 1Note The "Czy wiesz?" (eng. "Did you know?") dataset consists of almost 5k question-passage pairs obtained from "Czy wiesz..." section of Polish Wikipedia. Each question is written by a Wikipedia collaborator and is answered with a link to a relevant Wikipedia article.
piotr-rybak/allegro-faq
Viewer • Updated • 1.88k • 17Note Allegro FAQ is one of the PolEval 2022 test sets. It consists of 900 frequently asked questions and 921 help articles regarding the large Polish e-commerce platform - Allegro.com. Each question-passage pair is manually checked and edited where necessary.
piotr-rybak/legal-questions
Updated • 54Note Legal Questions is one of the PolEval 2022 test sets. It consists of 718 questions and approximately 26,000 passages extracted from over 1,000 acts of law.
Running25📈Polish Information Retrieval Benchmark (PIRB)
Note The benchmark for Polish Information Retrieval, consisting of 41 datasets.
sdadas/mmlw-retrieval-roberta-base
Sentence Similarity • Updated • 242 • 1Note Neural text encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=mmlw
sdadas/gpt-exams
Viewer • Updated • 8.13k • 44 • 3Note The dataset contains 8131 multi-domain question-answer pairs. It was created semi-automatically using the gpt-3.5-turbo-0613 model available in the OpenAI API.
apohllo/plt5-base-poquad
Text2Text Generation • Updated • 5 • 1Note This is a plT5-base model trained on the PoQuAD dataset. This model was trained as a result of single experiment run, so don't expect state-of-the-art results.
sdadas/polish-reranker-large-ranknet
Text Classification • Updated • 402 • 2Note Cross-encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=reranker
amu-cai/PES-2018-2022
Viewer • Updated • 35.6k • 61 • 3Note This dataset is 297 Polish Board Certification Examinations from years 2018-2022 in a form of multiple choice questions.
OrlikB/KartonBERT-USE-base-v1
Sentence Similarity • Updated • 4.36k • 5Note This universal sentence encoder model aims to be proficient in tasks involving sentence / document similarity.
sdadas/polish-reranker-roberta-v2
Text Classification • Updated • 1.05k • 2Note This is an improved version of reranker based on sdadas/polish-roberta-large-v2 trained with RankNet loss on a large dataset of text pairs.
sdadas/stella-pl-retrieval
Sentence Similarity • Updated • 190 • 7Note This is a text encoder based on stella_en_1.5B_v5 and further fine-tuned for Polish information retrieval tasks.