Dattu Sharma's picture

21 1

Dattu Sharma

imdatta0

·

https://datta0.github.io/

AI & ML interests

Everything ML. Specifically Deep Learning.

Organizations

imdatta0's activity

commented 2 papers 5 days ago

Stuffed Mamba: State Collapse and State Capacity of RNN-Based Long-Context Modeling

Paper • 2410.07145 • Published Oct 9 • 2 •

Round and Round We Go! What makes Rotary Positional Encodings useful?

Paper • 2410.06205 • Published Oct 8 •

commented 4 papers 26 days ago

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 28 •

Aria: An Open Multimodal Native Mixture-of-Experts Model

Paper • 2410.05993 • Published Oct 8 • 107 •

Differential Transformer

Paper • 2410.05258 • Published Oct 7 • 165 •

TPI-LLM: Serving 70B-scale LLMs Efficiently on Low-resource Edge Devices

Paper • 2410.00531 • Published Oct 1 • 28 •

commented a paper about 2 months ago

Qwen2.5-Coder Technical Report

Paper • 2409.12186 • Published Sep 18 • 130 •

commented 4 papers 2 months ago

The Mamba in the Llama: Distilling and Accelerating Hybrid Models

Paper • 2408.15237 • Published Aug 27 • 36 •

KTO: Model Alignment as Prospect Theoretic Optimization

Paper • 2402.01306 • Published Feb 2 • 15 •

Planning In Natural Language Improves LLM Search For Code Generation

Paper • 2409.03733 • Published Sep 5 •

OLMoE: Open Mixture-of-Experts Language Models

Paper • 2409.02060 • Published Sep 3 • 77 •

commented 3 papers 3 months ago

FocusLLM: Scaling LLM's Context by Parallel Decoding

Paper • 2408.11745 • Published Aug 21 • 23 •

Jamba-1.5: Hybrid Transformer-Mamba Models at Scale

Paper • 2408.12570 • Published Aug 22 • 29 •

LLM Pruning and Distillation in Practice: The Minitron Approach

Paper • 2408.11796 • Published Aug 21 • 53 •

New activity in imdatta0/pints 3 months ago

Librarian Bot: Add language metadata for dataset

#1 opened 3 months ago by

New activity in mistralai/Mistral-7B-Instruct-v0.3 3 months ago

Add tool calling support to chat template

#68 opened 3 months ago by

commented 4 papers 5 months ago

Is DPO Superior to PPO for LLM Alignment? A Comprehensive Study

Paper • 2404.10719 • Published Apr 16 • 4 •

MoRA: High-Rank Updating for Parameter-Efficient Fine-Tuning

Paper • 2405.12130 • Published May 20 • 45 •

S3D: A Simple and Cost-Effective Self-Speculative Decoding Scheme for Low-Memory GPUs

Paper • 2405.20314 • Published May 30 •

Contextual Position Encoding: Learning to Count What's Important

Paper • 2405.18719 • Published May 29 • 5 •