spaces-explorers (Spaces-explorers)

jxm

posted an update about 15 hours ago

Post

144

New state-of-the-art BERT-size retrieval model: *cde-small-v2* 🥳🍾

Hi everyone! We at Cornell are releasing a new retrieval model this week. It uses the contextual embeddings framework, is based on ModernBERT backbone, and gets state-of-the-art results on the MTEB benchmark for its model size (140M parameters). cde-small-v2 gets an average score of 65.6 across the 56 datasets and sees improvements from our previous model in *every* task domain (retrieval, classification, etc.).

We made a lot of changes to make this model work. First of all, ModernBERT has a better tokenizer, which probably helped this work out-of-the-box. We also followed the principles from the CDE paper and used harder clusters and better hard-negative filtering, which showed a small performance improvement. And we made a few small changes that have been shown to work on the larger models: we disabled weight decay, masked out the prefix tokens during pooling, and added a residual connection from the first-stage to the second-stage for better gradient flow.

We're still looking for a computer sponsor to help us scale CDE to larger models. Since it's now state-of-the-art at the 100M parameter scale, it seems to be a reasonable bet that we could train a state-of-the-art large model if we had the GPUs. If you're interested in helping with this, please reach out!

Here's a link to the model: jxm/cde-small-v2
And here's a link to the paper: Contextual Document Embeddings (2410.02525)

lvwerra

authored a paper 2 days ago

Towards Best Practices for Open Datasets for LLM Training

Paper • 2501.08365 • Published 4 days ago • 38

lvwerra

authored a paper 3 months ago

SelfCodeAlign: Self-Alignment for Code Generation

Paper • 2410.24198 • Published Oct 31, 2024 • 23

alienelf

authored a paper 5 months ago

InkubaLM: A small language model for low-resource African languages

Paper • 2408.17024 • Published Aug 30, 2024 • 13

lvwerra

authored a paper 7 months ago

The FineWeb Datasets: Decanting the Web for the Finest Text Data at Scale

Paper • 2406.17557 • Published Jun 25, 2024 • 90

lvwerra

authored a paper 8 months ago

Scaling Laws and Compute-Optimal Training Beyond Fixed Training Durations

Paper • 2405.18392 • Published May 28, 2024 • 12

aryaman

authored a paper 10 months ago

ReFT: Representation Finetuning for Language Models

Paper • 2404.03592 • Published Apr 4, 2024 • 92

lvwerra

authored a paper 11 months ago

StarCoder 2 and The Stack v2: The Next Generation

Paper • 2402.19173 • Published Feb 29, 2024 • 137

razent

authored a paper 11 months ago

Distillation Contrastive Decoding: Improving LLMs Reasoning with Contrastive Decoding and Distillation

Paper • 2402.14874 • Published Feb 21, 2024 • 4

aryaman

authored 3 papers 11 months ago

Norod78

posted an update 12 months ago

Post

I've prepared a Google Colab notebook which allows you to play with interpolating between different people using IP-Adapter SDXL Face-ID Plus.

#Prepare a list t of num_of_results values between 0 and 1
t_space = torch.linspace(0, 1, num_of_results)
for t in tqdm(t_space):
    mix_factor = t.item()
    # interpolate between the two face images 
    image = (image1 * (1 - mix_factor) + image2 * mix_factor).astype(np.uint8)
    # interpolate between the two face embedding 
    faceid_embeds = torch.lerp(faceid_embeds1, faceid_embeds2, t)
   #generate interpolated result
    images = ip_model.generate(prompt=prompt, negative_prompt=negative_prompt, face_image=image, faceid_embeds=faceid_embeds, shortcut=v2, num_samples=2, scale=scale, s_scale=s_scale, guidance_scale=guidance_scale, width=width, height=height, num_inference_steps=steps, seed=seed)