1 8 26

Mohamed Hisham Abdelzaher

MH0386

https://MH0386.github.io

AI & ML interests

None yet

Recent Activity

Reacted to Symbol-LLM's post with 🚀 21 days ago

🚀 Excited to introduce a new member of the OS-Copilot family: OS-Atlas - an open-sourced foundational action model for GUI agents 📘 Paper: https://huggingface.co/papers/2410.23218 🔗 Website: https://osatlas.github.io 😇 TL;DR: OS-Atlas offers: 1. State-of-the-Art GUI Grounding: Helps GUI agents accurately locate GUI elements. 2. Strong OOD Performance and Cross-platform Compatibility: Excels in out-of-domain agentic tasks across MacOS, Windows, Linux, Android, and Web. 3. Complete Infrastructure for GUI Data Synthesis: You can easily build your own OS agent upon it!

liked a Space about 1 month ago

mindee/doctr

liked a model about 2 months ago

liuhaotian/llava-v1.6-vicuna-7b

View all activity

Organizations

MH0386's activity

Reacted to Symbol-LLM's post with 🚀 21 days ago

Post

2116

🚀 Excited to introduce a new member of the OS-Copilot family: OS-Atlas - an open-sourced foundational action model for GUI agents

📘 Paper: OS-ATLAS: A Foundation Action Model for Generalist GUI Agents (2410.23218)
🔗 Website: https://osatlas.github.io

😇 TL;DR: OS-Atlas offers:
1. State-of-the-Art GUI Grounding: Helps GUI agents accurately locate GUI elements.
2. Strong OOD Performance and Cross-platform Compatibility: Excels in out-of-domain agentic tasks across MacOS, Windows, Linux, Android, and Web.
3. Complete Infrastructure for GUI Data Synthesis:
You can easily build your own OS agent upon it!

liked a Space about 1 month ago

Running

157

📑

docTR

liked a model about 2 months ago

liuhaotian/llava-v1.6-vicuna-7b

Image-Text-to-Text • Updated May 9 • 49.3k • 99

updated a dataset about 2 months ago

MH0386/seismic_data

Viewer • Updated Oct 5 • 97.8M • 34

upvoted an article 2 months ago

Article

Introducing Community Tools on HuggingChat

Sep 16

• 31

updated a dataset 2 months ago

MH0386/nasa_space_apps_2024_seismic_detection

Preview • Updated Sep 19 • 11

updated a Space 3 months ago

Sleeping

👀

Chatacher

liked a model 4 months ago

taocode/anitalker_ckpts

Updated Jul 31 • 4

liked a model 5 months ago

google/gemma-2-27b-it-pytorch

Text Generation • Updated Jun 27 • 13

upvoted a collection 5 months ago

💜 Kotlin ML Pack

Collection

A collection of datasets, fine-tuned models and benchmarks to train your models for perfect Kotlin code generation. • 9 items • Updated Jun 11 • 21

updated a collection 7 months ago

Graduation Project

Collection

3 items • Updated May 10

updated a model 7 months ago

MH0386/digit_recognizer

Updated May 5 • 8

liked a model 7 months ago

microsoft/Phi-3-mini-128k-instruct

Text Generation • Updated Aug 20 • 1.06M • 1.61k

upvoted a collection 7 months ago

Phi-3

Collection

Phi-3 family of small language and multi-modal models. Language models are available in short- and long-context lengths. • 26 items • Updated 13 days ago • 499

updated a model 7 months ago

MH0386/phi-2-napoleon-bonaparte

Updated Apr 18

liked a model 7 months ago

meta-llama/Meta-Llama-3-8B-Instruct

Text Generation • Updated Sep 27 • 2.41M • • 3.64k

upvoted an article 7 months ago

Article

Welcome Llama 3 - Meta's new open LLM

Apr 18

• 278

Reacted to Jaward's post with 👍 7 months ago

Post

3254

Let's breakdown the technical details in Microsoft's mind blowing Lifelike audio-driven talking faces framework - VASA and model VASA-1:

Summary of Summaries
- The paper introduces VASA, a framework for generating lifelike talking faces with appealing visual affective skills (VAS) from a single image and speech audio.
- Core innovations include a diffusion-based model for holistic generation of facial dynamics and head movements in an expressive, disentangled face latent space developed using video data..
- VASA-1 Generates high-quality 512x512 videos at up to 40 FPS with low latency.
- Supports real-time generation of lifelike, emotive talking faces.

Summary of Overall Framework:
- VASA generates facial dynamics and head motion in latent space, conditioned on audio and other signals
- Instead of directly generating video frames, it generates holistic facial dynamics and head motion in a latent space, conditioned on audio and optional signals.
- To achieve this, the framework uses a face encoder-decoder to extract appearance and identity features and train a Diffusion Transformer model to generate motion latent codes.

Technical Method Details:
Expressive and Disentangled Face Latent Space Construction:
- Based on 3D-AID face reenactment framework
- Decomposes face into 3D appearance volume, identity code, head pose,
and facial dynamics latents
- Uses encoders to extract these latent factors from face images.
- Applies additional losses to improve disentanglement:
- Pairwise head pose and facial dynamics transfer loss
- Face identity similarity loss for cross-identity pose/dynamics transfer

Holistic Facial Dynamics Generation with Diffusion Transformer:
- Represents all facial movements (lip, expression, gaze, etc.) as a single
latent sequence
- Applies a Diffusion Transformer model to generate the facial dynamics sequence.
- Diffusion Transformer trained with simplified denoising score matching objective.

12 replies

upvoted a collection 7 months ago

Meta Llama 3

Collection

This collection hosts the transformers and original repos of the Meta Llama 3 and Llama Guard 2 releases • 5 items • Updated Sep 25 • 683

updated a collection 8 months ago

Graduation Project

Collection

3 items • Updated May 10