Model Card for Model ID

Chatbots can be programmed with a large knowledge base on answer users' questions on a variety of topics. They can provide facts, data, explanations, definitions, etc. Complete tasks. Chatbots can be integrated with other systems and APIs to actually do things for users. Based on a user's preferences and past interactions, chatbots can suggest products, services, content and more that might be relevant and useful to the user. Provide customer service. Chatbots can handle many simple customer service interactions to answer questions, handle complaints, process returns, etc. This allows human agents to focus on more complex issues. Generate conversational responses - Using NLP and machine learning, chatbots can understand natural language and generate conversational responses, creating fluent interactions.

Model Details

Model Description

Model type: Mistral
Language(s) (NLP): Vietnamese
Finetuned from model : Viet-Mistral/Vistral-7B-Chat

Purpose

This model is a improve from the old one. It's have the new the tokenizer_config.json to use <|im_start|> and <|im_end|> as the additional special tokens.

Training Data

Our dataset was make base on our university sudent notebook. It includes majors, university regulations and other information about our university.
hcmue_qa

Instruction Format

In order to leverage instruction fine-tuning, your prompt should be surrounded by <|im_start|> and <|im_end|> tokens. The very first instruction should begin with a begin of sentence id. The next instructions should not. The assistant generation will be ended by the end-of-sentence token id.

E.g.

role ="user"
prompt ="hi"
chatml = f"<|im_start|>{role}\n{prompt}<|im_end|>\n"

Here is the dataset after adding this format.

Training Procedure

# Load LoRA configuration
peft_config = LoraConfig(
    r=8,
    lora_alpha=16,
    target_modules=[
        "q_proj",
        "k_proj",
        "v_proj",
        "o_proj",
        "gate_proj",
        "up_proj",
        "down_proj",
        "lm_head",
    ],
    bias="none",
    lora_dropout=0.05,  # Conventional
    task_type="CAUSAL_LM",
)

#update newchat template
"chat_template": "{% for message in messages %}{{'<|im_start|>' + message['role'] + '\n' + message['content'] + '<|im_end|>' + '\n'}}{% endfor %}{% if add_generation_prompt %}{{ '<|im_start|>assistant\n' }}{% endif %}",
}

Run

free colab run

Training report

report

Contact

nguyndantdm6@gmail.com

Tamnemtf
/

Ae-calem-mistral-7b-v0.2