calamanCy: Tagalog NLP pipelines in spaCy

Paper: arxiv.org/abs/2311.07171

Feature Description
Name tl_calamancy_md
Version 0.1.0
spaCy >=3.5.0,<4.0.0
Default Pipeline tok2vec, tagger, morphologizer, parser, ner
Components tok2vec, tagger, morphologizer, parser, ner
Vectors -1 keys, 50000 unique vectors (200 dimensions)
Sources TLUnified dataset (Jan Christian Blaise Cruz and Charibeth Cheng)
UD_Tagalog-TRG (Stephanie Samson, Daniel Zeman, and Mary Ann C. Tan)
UD_Tagalog-Ugnayan (Angelina Aquino)
License MIT
Author Lester James V. Miranda

Label Scheme

View label scheme (120 labels for 4 components)
Component Labels
tagger ADJ, ADJ_PART, ADP, ADV, ADV_PART, AUX, CCONJ, DET, DET_ADP, DET_PART, INTJ, NOUN, NOUN_PART, NUM, NUM_PART, PART, PRON, PRON_PART, PROPN, PUNCT, SCONJ, VERB, VERB_PART
morphologizer Aspect=Perf|Mood=Ind|POS=VERB|Voice=Act, Case=Nom|POS=ADP, POS=NOUN, POS=PUNCT, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Pass, Case=Gen|POS=ADP, Case=Gen|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV|PronType=Dem, Foreign=Yes|POS=NOUN, Degree=Pos|POS=ADJ, Case=Nom|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Nom|Deixis=Med|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=PROPN, Case=Gen|Number=Sing|POS=PRON|Person=3|PronType=Prs, Degree=Pos|Link=Yes|POS=ADJ, POS=ADP, Case=Dat|POS=ADP, POS=VERB|Polarity=Pos, Aspect=Hab|POS=VERB, POS=SCONJ, Case=Nom|Number=Sing|POS=PRON|Person=1|PronType=Prs, Aspect=Prosp|Mood=Ind|POS=VERB|Voice=Act, POS=ADV, POS=PART|Polarity=Neg, Aspect=Imp|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Lfoc, POS=PROPN, Case=Nom|Deixis=Prox|Number=Sing|POS=PRON|PronType=Dem, Gender=Masc|POS=NOUN, Gender=Fem|POS=NOUN, Degree=Pos|Gender=Fem|POS=ADJ, Gender=Fem|POS=PROPN, Case=Nom|Clusivity=In|Number=Dual|POS=PRON|Person=1|PronType=Prs, Number=Plur|POS=DET|PronType=Ind, Case=Nom|Number=Plur|POS=PRON|Person=3|PronType=Prs, POS=PRON|PronType=Prs|Reflex=Yes, Gender=Masc|POS=DET|PronType=Emp, Case=Nom|POS=PRON|PronType=Int, Link=Yes|POS=NOUN, POS=PART|PartType=Int, POS=INTJ|Polarity=Pos, Link=Yes|POS=PART|PartType=Int, POS=VERB|Polarity=Neg, Degree=Pos|POS=ADJ|PronType=Int, Case=Gen|Number=Plur|POS=PRON|Person=3|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Act, Case=Nom|Number=Sing|POS=PRON|Person=2|PronType=Prs, Aspect=Perf|Mood=Ind|POS=VERB|PronType=Int|Voice=Pass, Aspect=Perf|Mood=Ind|POS=VERB|Voice=Ifoc, POS=ADV|PronType=Int, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Act, POS=PART|PartType=Nfh, Deixis=Remt|POS=ADV|PronType=Dem, Aspect=Imp|Mood=Pot|POS=VERB|Voice=Act, Link=Yes|POS=VERB|Polarity=Pos, Link=Yes|POS=VERB|Polarity=Neg, POS=PART|PartType=Des, Mood=Imp|POS=AUX|Polarity=Neg, Case=Nom|Link=Yes|Number=Plur|POS=PRON|Person=2|PronType=Prs, Case=Nom|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Pass, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Lfoc, Aspect=Prog|Mood=Ind|POS=VERB|Voice=Bfoc, POS=DET|PronType=Tot, Case=Dat|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Link=Yes|POS=PRON|PronType=Prs|Reflex=Yes, Mood=Imp|POS=VERB|Voice=Act, Case=Dat|Number=Sing|POS=PRON|Person=3|PronType=Prs, Mood=Imp|POS=VERB|Voice=Lfoc, Case=Gen|Number=Sing|POS=PRON|Person=2|PronType=Prs, Mood=Imp|POS=VERB|Voice=Pass, Case=Gen|Clusivity=In|Number=Plur|POS=PRON|Person=1|PronType=Prs, Aspect=Hab|POS=VERB|Voice=Pass, Gender=Masc|Link=Yes|POS=PROPN, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=3|PronType=Prs, Case=Gen|Link=Yes|Number=Sing|POS=PRON|Person=1|PronType=Prs, POS=ADJ, POS=PART, POS=PRON, POS=VERB, POS=INTJ, POS=CCONJ, POS=NUM, POS=DET
parser ROOT, advmod, case, dep, nmod, nsubj, obj, obl, punct
ner LOC, ORG, PER

Citation

@inproceedings{miranda-2023-calamancy,
    title = "calaman{C}y: A {T}agalog Natural Language Processing Toolkit",
    author = "Miranda, Lester James",
    booktitle = "Proceedings of the 3rd Workshop for Natural Language Processing Open Source Software (NLP-OSS 2023)",
    month = dec,
    year = "2023",
    address = "Singapore, Singapore",
    publisher = "Empirical Methods in Natural Language Processing",
    url = "https://aclanthology.org/2023.nlposs-1.1",
    pages = "1--7",
}
Downloads last month
30
Inference Examples
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social visibility and check back later, or deploy to Inference Endpoints (dedicated) instead.

Dataset used to train ljvmiranda921/tl_calamancy_md

Collection including ljvmiranda921/tl_calamancy_md

Evaluation results