--- library_name: ctranslate2 license: mit base_model: - microsoft/phi-4 base_model_relation: quantized tags: - ctranslate2 - phi-4 - chat --- Ctranslate2 convesion of Phi-4 # Example Usage
Non-Streaming Example: ```python import ctranslate2 from transformers import AutoTokenizer def generate_response(prompt, system_message, model_path): generator = ctranslate2.Generator( model_path, device="cuda", compute_type="int8" ) tokenizer = AutoTokenizer.from_pretrained(model_path) formatted_prompt = f"""<|im_start|>system<|im_sep|>{system_message}<|im_end|> <|im_start|>user<|im_sep|>{prompt}<|im_end|> <|im_start|>assistant<|im_sep|>""" tokens = tokenizer.convert_ids_to_tokens(tokenizer.encode(formatted_prompt)) results = generator.generate_batch( [tokens], max_length=1024, sampling_temperature=0.7 ) response = tokenizer.decode(results[0].sequences_ids[0], skip_special_tokens=True) return response if __name__ == "__main__": model_path = "path/to/your/phi-4-ct2-model" system_message = "You are a helpful AI assistant." user_prompt = "Write a short poem about a cat." response = generate_response(user_prompt, system_message, model_path) print("\nGenerated response:") print(response) ```
Streaming Example: ```python import ctranslate2 from transformers import AutoTokenizer import sys def generate_response(prompt, system_message, model_path): """ Generates and streams a response from an AI assistant. Initializes the CTranslate2 generator and tokenizer, formats the input prompt, tokenizes it, and streams the generated tokens by printing them as they are produced. Parameters: prompt (str): The user's input prompt. system_message (str): The system-level instruction. model_path (str): Path to the CTranslate2 model directory. """ generator = ctranslate2.Generator(model_path, device="cuda", compute_type="int8") tokenizer = AutoTokenizer.from_pretrained(model_path) formatted_prompt = f"""<|im_start|>system<|im_sep|>{system_message}<|im_end|> <|im_start|>user<|im_sep|>{prompt}<|im_end|> <|im_start|>assistant<|im_sep|>""" tokens = tokenizer.tokenize(formatted_prompt) for step in generator.generate_tokens([tokens], max_length=1024, sampling_temperature=0.7): token = step.tokens[0] decoded_token = tokenizer.decode([step.token_ids[0]]) print(decoded_token, end="", flush=True) if token in tokenizer.eos_token or token in tokenizer.all_special_tokens: break if __name__ == "__main__": model_path = "path/to/your/phi-4-ct2-model" system_message = "You are a helpful AI assistant." user_prompt = "Write a short poem about a cat." print("\nGenerating response:") generate_response(user_prompt, system_message, model_path) ```