3 1 11

Jorge Vicente Mendoza

0xjorgev

https://0xjorgev.omg.lol

AI & ML interests

Stable Diffusion , llama, transformers, GPTs

Recent Activity

New activity 6 days ago

corenet-community/coreml-OpenELM-270M:Can't compile on Xcode

liked a model 7 days ago

corenet-community/coreml-OpenELM-270M

liked a model 24 days ago

marroyo777/Llama-3.2-1B-Instruct-IQ4_XS-GGUF

View all activity

Organizations

0xjorgev's activity

New activity in corenet-community/coreml-OpenELM-270M 6 days ago

Can't compile on Xcode

#2 opened 7 days ago by

0xjorgev

liked a model 7 days ago

corenet-community/coreml-OpenELM-270M

Updated Apr 30 • 9 • 2

liked 4 models 24 days ago

Reacted to pcuenq's post with 🚀 about 1 month ago

Post

4379

OpenELM in Core ML

Apple recently released a set of efficient LLMs in sizes varying between 270M and 3B parameters. Their quality, according to benchmarks, is similar to OLMo models of comparable size, but they required half the pre-training tokens because they use layer-wise scaling, where the number of attention heads increases in deeper layers.

I converted these models to Core ML, for use on Apple Silicon, using this script: https://gist.github.com/pcuenca/23cd08443460bc90854e2a6f0f575084. The converted models were uploaded to this community in the Hub for anyone that wants to integrate inside their apps: corenet-community/openelm-core-ml-6630c6b19268a5d878cfd194

The conversion was done with the following parameters:
- Precision: float32.
- Sequence length: fixed to 128.

With swift-transformers (https://github.com/huggingface/swift-transformers), I'm getting about 56 tok/s with the 270M on my M1 Max, and 6.5 with the largest 3B model. These speeds could be improved by converting to float16. However, there's some precision loss somewhere and generation doesn't work in float16 mode yet. I'm looking into this and will keep you posted! Or take a look at this issue if you'd like to help: https://github.com/huggingface/swift-transformers/issues/95

I'm also looking at optimizing inference using an experimental kv cache in swift-transformers. It's a bit tricky because the layers have varying number of attention heads, but I'm curious to see how much this feature can accelerate performance in this model family :)

Regarding the instruct fine-tuned models, I don't know the chat template that was used. The models use the Llama 2 tokenizer, but the Llama 2 chat template, or the default Alignment Handbook one that was used to train, are not recognized. Any ideas on this welcome!

4 replies

upvoted a collection about 1 month ago

OpenELM Pretrained Models

Collection

4 items • Updated Oct 4 • 47

New activity in apple/OpenELM-270M about 1 month ago

Is it possible to port from .mlpackage to .mlmodelc

#4 opened about 2 months ago by

0xjorgev

Reacted to MonsterMMORPG's post with 🔥 4 months ago

Post

5217

LivePortrait AI: Transform Static Photos into Talking Videos. Now supporting Video-to-Video conversion and Superior Expression Transfer at Remarkable Speed

A new tutorial is anticipated to showcase the latest changes and features in V3, including Video-to-Video capabilities and additional enhancements.

This post provides information for both Windows (local) and Cloud installations (Massed Compute, RunPod, and free Kaggle Account).

🔗 Windows Local Installation Tutorial ️⤵️
▶️ https://youtu.be/FPtpNrmuwXk

🔗 Cloud (no-GPU) Installations Tutorial for Massed Compute, RunPod and free Kaggle Account ️⤵️
▶️ https://youtu.be/wG7oPp01COg

The V3 update introduces video-to-video functionality. If you're seeking a one-click installation method for LivePortrait, an open-source zero-shot image-to-animation application on Windows, for local use, this tutorial is essential. It introduces the cutting-edge image-to-animation open-source generator Live Portrait. Simply provide a static image and a driving video to create an impressive animation in seconds. LivePortrait is incredibly fast and adept at preserving facial expressions from the input video. The results are truly astonishing.

With the V3 update adding video-to-video functionality, those interested in using LivePortrait but lacking a powerful GPU, using a Mac, or preferring cloud-based solutions will find this tutorial invaluable. It guides you through the one-click installation and usage of LivePortrait on #MassedCompute, #RunPod, and even a free #Kaggle account. After following this tutorial, you'll find running LivePortrait on cloud services as straightforward as running it locally. LivePortrait is the latest state-of-the-art static image to talking animation generator, surpassing even paid services in both speed and quality.