DevQuasar

community
Verified
Activity Feed

AI & ML interests

Open-Source LLMs, Local AI Projects: https://pypi.org/project/llm-predictive-router/

Recent Activity

csabakecskemeti  published a Space about 4 hours ago
DevQuasar/Mi50
csabakecskemeti  updated a Space about 5 hours ago
DevQuasar/Mi50
View all activity

DevQuasar's activity

csabakecskemeti 
published a Space about 4 hours ago
csabakecskemeti 
updated a Space about 5 hours ago
csabakecskemeti 
posted an update about 18 hours ago
csabakecskemeti 
posted an update 4 days ago
view post
Post
2258
I've run the open llm leaderboard evaluations + hellaswag on deepseek-ai/DeepSeek-R1-Distill-Llama-8B and compared to meta-llama/Llama-3.1-8B-Instruct and at first glance R1 do not beat Llama overall.

If anyone wants to double check the results are posted here:
https://github.com/csabakecskemeti/lm_eval_results

Am I made some mistake, or (at least this distilled version) not as good/better than the competition?

I'll run the same on the Qwen 7B distilled version too.
·
csabakecskemeti 
posted an update 10 days ago
csabakecskemeti 
posted an update 26 days ago
csabakecskemeti 
posted an update 27 days ago
csabakecskemeti 
posted an update 29 days ago
csabakecskemeti 
posted an update about 1 month ago
csabakecskemeti 
posted an update about 1 month ago
view post
Post
1474
I've built a small utility to split safetensors file by file.
The issue/need came up when I've tried to convert the new Deepseek V3 model from FP8 to BF16.
The only Ada architecture GPU I have is an RTX 4080 and the 16GB vram was just wasn't enough for the conversion.

BTW: I'll upload the bf16 version here:
DevQuasar/deepseek-ai.DeepSeek-V3-Base-bf16
(it will take a while - days with my upload speed)
If anyone has access the resources to test it I'd appreciate a feedback if it's working or not.

The tool, is available from here:
https://github.com/csabakecskemeti/ai_utils/blob/main/safetensor_splitter.py
It's splitting every file to n pieces by the layers if possible, and create a new "model.safetensors.index.json" file.
I've tested it with Llama 3.1 8B and multiple split sizes, and validated by using inference pipeline.
use --help for usage
Please note current version expects the model is already multiple file and have a "model.safetensors.index.json" layer-safetensor mapping file.
csabakecskemeti 
posted an update about 1 month ago
csabakecskemeti 
posted an update about 2 months ago
csabakecskemeti 
posted an update 2 months ago
csabakecskemeti 
posted an update 2 months ago
view post
Post
1183
I have this small utility: no_more_typo
It is running in the background and able to call the LLM model to update the text on the clipboard. I think it would be ideal to fix typos and syntax.
I have just added the option to use custom prompt templates to perform different tasks.

Details, code and executable:
https://github.com/csabakecskemeti/no_more_typo

https://devquasar.com/no-more-typo/
csabakecskemeti 
posted an update 2 months ago
view post
Post
294
Repurposed my older AI workstation to a homelab server, it has received 2xV100 + 1xP40
I can reach huge 210k token context size with MegaBeam-Mistral-7B-512k-GGUF ~70+tok/s, or run Llama-3.1-Nemotron-70B-Instruct-HF-GGUF with 50k Context ~10tok/s (V100 only 40k ctx and 15tok/s).
Also able to Lora finetune with similar performace as an RTX3090.
It moved to the garage to no complaints for the noise from the family. Will move to a Rack soon :D
  • 2 replies
·