YAML Metadata
Warning:
empty or missing yaml metadata in repo card
(https://huggingface.co/docs/hub/model-cards#model-card-metadata)
This model has not been trained on any Cantonese material.
It is simply a base model in which the embeddings and tokenizer were patched with Cantonese characters. One can find the original model gpt2-tiny-chinese.
I used this repo to identify missing Cantonese characters
My forked and modified version
After identifying the missing characters, the Huggingface library provides very high level API to modify the tokenizer and embeddings.
Download a tokenizer and a model from the Huggingface library. Then:
tokenizer.add_tokens("your new tokens")
model.resize_token_embeddings(len(tokenizer))
tokenizer.push_to_hub("your model name")
- Downloads last month
- 8
This model does not have enough activity to be deployed to Inference API (serverless) yet. Increase its social
visibility and check back later, or deploy to Inference Endpoints (dedicated)
instead.