This project presents a fine-tuned version of Microsoft's Phi-3.5 model, optimized for enhanced conversational abilities and general knowledge tasks.
Model Details - Base model: microsoft/Phi-3.5-mini-instruct - Fine-tuning method: PEFT (Parameter-Efficient Fine-Tuning) - Training data: [Brief description of your dataset]
Features - Improved response generation for a wide range of topics - Enhanced context understanding and coherence - Optimized for deployment on Hugging Face Spaces
Usage This model can be used for various natural language processing tasks, including: - General conversation - Question answering - Task instructions - Creative writing
Limitations While this fine-tuned model shows improved performance, users should be aware of potential biases and limitations inherent in language models. Always critically evaluate the model's outputs.
Feedback I welcome any feedback, suggestions, or questions about this project. Feel free to open an issue or contribute to further improvements!
Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model! This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244 Key highlights of Llama-3.1-Storm-8B:
Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:
Instruction Following (IFEval): +3.93% Knowledge-driven QA (GPQA): +7.21% Reduced Hallucinations (TruthfulQA): +9% Function Calling (BFCL): +7.92%
Achieves impressive results with only 8B parameters Uses innovative techniques like self-curation and model merging
Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient. #AI #NLP #MachineLearning #GradioApp #Llama3
This model essentially explores having different experts (MoE) for image encoder part of vision language model. How? 🧐 The authors concatenate the vision encoder output tokens together, and they apply "pre-alignment" essentially fine-tune experts with frozen text encoder.
Then they freeze both experts and the decoder and just train the projection layer, and finally, they unfreeze everything for supervised fine-tuning ✨
In the paper, they explore different fusion strategies and vision encoders, extending basic CLIP encoder, and figure out simply concatenating visual tokens works well. Rest of the architecture is quite similar to LLaVA. (see below the architecture)
Excited to share my new Gradio app featuring the impressive Llama-3.1-Storm-8B model! This app demonstrates the capabilities of Llama-3.1-Storm-8B, an 8B parameter language model created by Ashvini Kumar Jindal, Pawan Kumar Rajpoot, Ankur Parikh,@akjindal53244 Key highlights of Llama-3.1-Storm-8B:
Outperforms Llama-3.1-8B-Instruct on multiple benchmarks:
Instruction Following (IFEval): +3.93% Knowledge-driven QA (GPQA): +7.21% Reduced Hallucinations (TruthfulQA): +9% Function Calling (BFCL): +7.92%
Achieves impressive results with only 8B parameters Uses innovative techniques like self-curation and model merging
Kudos to the creators for pushing the boundaries of smaller language models! This work makes advanced AI more accessible and efficient. #AI #NLP #MachineLearning #GradioApp #Llama3