video-p2p-library

AI & ML interests

None defined yet.

Recent Activity

video-p2p-library's activity

AtAndDevΒ 
posted an update 1 day ago
view post
Post
974
Gemma 3 seems to be really good at human preference. Just waiting for ppl to see it.
ehristoforuΒ 
posted an update 18 days ago
view post
Post
2742
Introducing our first standalone model – FluentlyLM Prinum

Introducing the first standalone model from Project Fluently LM! We worked on it for several months, used different approaches and eventually found the optimal one.

General characteristics:
- Model type: Causal language models (QwenForCausalLM, LM Transformer)
- Number of parameters: 32.5B
- Number of parameters (not embedded): 31.0B
- Number of layers: 64
- Context: 131,072 tokens
- Language(s) (NLP): English, French, Spanish, Russian, Chinese, Japanese, Persian (officially supported)
- License: MIT

Creation strategy:
The basis of the strategy is shown in Pic. 2.
We used Axolotl & Unsloth for SFT-finetuning with PEFT LoRA (rank=64, alpha=64) and Mergekit for SLERP and TIES mergers.

Evolution:
πŸ† 12th place in the Open LLM Leaderboard ( open-llm-leaderboard/open_llm_leaderboard) (21.02.2025)

Detailed results and comparisons are presented in Pic. 3.

Links:
- Model: fluently-lm/FluentlyLM-Prinum
- GGUF version: mradermacher/FluentlyLM-Prinum-GGUF
- Demo on ZeroGPU: ehristoforu/FluentlyLM-Prinum-demo
  • 7 replies
Β·
AtAndDevΒ 
posted an update 26 days ago
view post
Post
2422
@nroggendorff is that you sama?
  • 2 replies
Β·
ameerazam08Β 
posted an update about 1 month ago
AtAndDevΒ 
posted an update about 1 month ago
view post
Post
1892
everywhere i go i see his face
AtAndDevΒ 
posted an update about 2 months ago
view post
Post
533
Deepseek gang on fire fr fr
AtAndDevΒ 
posted an update about 2 months ago
view post
Post
1614
R1 is out! And with a lot of other R1 releated models...
ehristoforuΒ 
posted an update 3 months ago
view post
Post
3606
βœ’οΈ Ultraset - all-in-one dataset for SFT training in Alpaca format.
fluently-sets/ultraset

❓ Ultraset is a comprehensive dataset for training Large Language Models (LLMs) using the SFT (instruction-based Fine-Tuning) method. This dataset consists of over 785 thousand entries in eight languages, including English, Russian, French, Italian, Spanish, German, Chinese, and Korean.

🀯 Ultraset solves the problem faced by users when selecting an appropriate dataset for LLM training. It combines various types of data required to enhance the model's skills in areas such as text writing and editing, mathematics, coding, biology, medicine, finance, and multilingualism.

πŸ€— For effective use of the dataset, it is recommended to utilize only the "instruction," "input," and "output" columns and train the model for 1-3 epochs. The dataset does not include DPO or Instruct data, making it suitable for training various types of LLM models.

❇️ Ultraset is an excellent tool to improve your language model's skills in diverse knowledge areas.
akhaliqΒ 
posted an update 3 months ago
view post
Post
13260
Google drops Gemini 2.0 Flash Thinking

a new experimental model that unlocks stronger reasoning capabilities and shows its thoughts. The model plans (with thoughts visible), can solve complex problems with Flash speeds, and more

now available in anychat, try it out: akhaliq/anychat
Β·
AtAndDevΒ 
posted an update 3 months ago
view post
Post
463
@s3nh Hey man check your discord! Got some news.
  • 4 replies
Β·
akhaliqΒ 
posted an update 4 months ago
view post
Post
13758
QwQ-32B-Preview is now available in anychat

A reasoning model that is competitive with OpenAI o1-mini and o1-preview

try it out: akhaliq/anychat
  • 1 reply
Β·
akhaliqΒ 
posted an update 4 months ago
view post
Post
4231
New model drop in anychat

allenai/Llama-3.1-Tulu-3-8B is now available

try it here: akhaliq/anychat
akhaliqΒ 
posted an update 4 months ago
view post
Post
3137
anychat

supports chatgpt, gemini, perplexity, claude, meta llama, grok all in one app

try it out there: akhaliq/anychat
JoseRFJuniorΒ 
posted an update 7 months ago
view post
Post
1698
JoseRFJunior/TransNAR
https://github.com/JoseRFJuniorLLMs/TransNAR
https://arxiv.org/html/2406.09308v1
TransNAR hybrid architecture. Similar to Alayrac et al, we interleave existing Transformer layers with gated cross-attention layers which enable information to flow from the NAR to the Transformer. We generate queries from tokens while we obtain keys and values from nodes and edges of the graph. The node and edge embeddings are obtained by running the NAR on the graph version of the reasoning task to be solved. When experimenting with pre-trained Transformers, we initially close the cross-attention gate, in order to fully preserve the language model’s internal knowledge at the beginning of training.