Polish Question Answering

piotr-rybak 's Collections

Extract anything datasets

updated Oct 17, 2024

Collection of models and datasets for Polish Question Answering.

Upvote

ipipan/silver-retriever-base-v1.1

Sentence Similarity • 0.1B • Updated Oct 26, 2024 • 976 • 9

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/silver-retriever-base-v1

Sentence Similarity • 0.1B • Updated May 24, 2024 • 808 • • 12

Note SilverRetriever is a state-of-the-art neural passage retriever trained on the PolQA and MAUPQA datasets.
ipipan/polqa

Updated May 24, 2024 • 140 • 11

Note PolQA is the first Polish dataset for open-domain question answering. It consists of 7,000 questions, 87,525 manually labeled evidence passages, and a corpus of over 7 million candidate passages. The dataset can be used to train both a passage retriever and an abstractive reader.
ipipan/maupqa

Updated May 24, 2024 • 198 • 5

Note MAUPQA is a collection of 14 datasets for Polish document retrieval. Most of the datasets are either machine-generated or machine-translated from English. Across all datasets, it consists of over 1M questions, 1M positive, and 7M hard-negative question-passage pairs.
clarin-pl/poquad

Viewer • Updated Jul 4, 2023 • 52k • 141 • 6

Note PoQuAD is a Polish equivalent of the SQuAD. It consists of more than 70,000 question-passage pairs, as well as extractive and abstractive answers.
allegro/polish-question-passage-pairs

Viewer • Updated Sep 23, 2021 • 10.4k • 18 • 4

Note Over 10,000 manually annotated question-passage pairs. While the questions are taken from the PolQA dataset, the passages are often unique. In particular, the dataset consists mostly of hard negatives (8k pairs).
allegro/klej-dyk

Viewer • Updated Oct 26, 2022 • 5.18k • 206 • 1

Note The "Czy wiesz?" (eng. "Did you know?") dataset consists of almost 5k question-passage pairs obtained from "Czy wiesz..." section of Polish Wikipedia. Each question is written by a Wikipedia collaborator and is answered with a link to a relevant Wikipedia article.
piotr-rybak/allegro-faq

Viewer • Updated Aug 19, 2023 • 1.88k • 25

Note Allegro FAQ is one of the PolEval 2022 test sets. It consists of 900 frequently asked questions and 921 help articles regarding the large Polish e-commerce platform - Allegro.com. Each question-passage pair is manually checked and edited where necessary.
piotr-rybak/legal-questions

Updated Dec 14, 2023 • 14

Note Legal Questions is one of the PolEval 2022 test sets. It consists of 718 questions and approximately 26,000 passages extracted from over 1,000 acts of law.
Running

35

35

Polish Information Retrieval Benchmark (PIRB)

📈

Display evaluation results in a leaderboard

Note The benchmark for Polish Information Retrieval, consisting of 41 datasets.
sdadas/mmlw-retrieval-roberta-base

Sentence Similarity • 0.1B • Updated Oct 29, 2024 • 3.53k

Note Neural text encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=mmlw
sdadas/gpt-exams

Viewer • Updated Sep 9, 2023 • 8.13k • 15 • 3

Note The dataset contains 8131 multi-domain question-answer pairs. It was created semi-automatically using the gpt-3.5-turbo-0613 model available in the OpenAI API.
apohllo/plt5-base-poquad

Text Generation • 0.3B • Updated Nov 28, 2023 • 2 • 1

Note This is a plT5-base model trained on the PoQuAD dataset. This model was trained as a result of single experiment run, so don't expect state-of-the-art results.
sdadas/polish-reranker-large-ranknet

Text Ranking • 0.4B • Updated Apr 2 • 1.96k • 2

Note Cross-encoder for Polish, see more models here: https://huggingface.co/sdadas?search_models=reranker
amu-cai/PES-2018-2022

Viewer • Updated Jul 3, 2024 • 35.6k • 33 • 4

Note This dataset is 297 Polish Board Certification Examinations from years 2018-2022 in a form of multiple choice questions.
sdadas/polish-reranker-roberta-v2

Text Ranking • 0.4B • Updated Apr 2 • 4.75k • 2

Note This is an improved version of reranker based on sdadas/polish-roberta-large-v2 trained with RankNet loss on a large dataset of text pairs.
sdadas/stella-pl-retrieval

Sentence Similarity • 2B • Updated 1 day ago • 20.8k • 12

Note This is a text encoder based on stella_en_1.5B_v5 and further fine-tuned for Polish information retrieval tasks.

Upvote

Polish Question Answering

Polish Information Retrieval Benchmark (PIRB)