Text Embedding Models

By default (for backward compatibility), when TEXT_EMBEDDING_MODELS environment variable is not defined, transformers.js embedding models will be used for embedding tasks, specifically, the Xenova/gte-small model.

You can customize the embedding model by setting TEXT_EMBEDDING_MODELS in your .env.local file where the required fields are name, chunkCharLength and endpoints.

Supported text embedding backends are: transformers.js, TEI and OpenAI. transformers.js models run locally as part of chat-ui, whereas TEI models run in a different environment & accessed through an API endpoint. openai models are accessed through the OpenAI API.

When more than one embedding models are supplied in .env.local file, the first will be used by default, and the others will only be used on LLM’s which configured embeddingModel to the name of the model.

Transformers.js

The Transformers.js backend uses local CPU for the embedding which can be quite slow. If possible, consider using TEI or OpenAI embeddings instead if you use web search frequently, as performance will improve significantly.

TEXT_EMBEDDING_MODELS = `[
  {
    "name": "Xenova/gte-small",
    "displayName": "Xenova/gte-small",
    "description": "locally running embedding",
    "chunkCharLength": 512,
    "endpoints": [
      { "type": "transformersjs" }
    ]
  }
]`

Text Embeddings Inference (TEI)

Text Embeddings Inference (TEI) is a comprehensive toolkit designed for efficient deployment and serving of open source text embeddings models. It enables high-performance extraction for the most popular models, including FlagEmbedding, Ember, GTE, and E5.

Some recommended models at the time of writing (May 2024) are Snowflake/snowflake-arctic-embed-m and BAAI/bge-large-en-v1.5. You may run TEI locally with GPU support via Docker:

docker run --gpus all -p 8080:80 -v tei-data:/data --name tei ghcr.io/huggingface/text-embeddings-inference:1.2 --model-id YOUR/HF_MODEL

You can then hook this up to your Chat UI instance with the following configuration.

TEXT_EMBEDDING_MODELS=`[
  {
    "name": "YOUR/HF_MODEL",
    "displayName": "YOUR/HF_MODEL",
    "preQuery": "Check the model documentation for the preQuery. Not all models have one",
    "prePassage": "Check the model documentation for the prePassage. Not all models have one",
    "chunkCharLength": 512,
    "endpoints": [{
      "type": "tei",
      "url": "http://127.0.0.1:8080/"
    }]
  }
]`

Examples for Snowflake/snowflake-arctic-embed-m and BAAI/bge-large-en-v1.5:

TEXT_EMBEDDING_MODELS=`[
  {
    "name": "Snowflake/snowflake-arctic-embed-m",
    "displayName": "Snowflake/snowflake-arctic-embed-m",
    "preQuery": "Represent this sentence for searching relevant passages: ",
    "chunkCharLength": 512,
    "endpoints": [{
      "type": "tei",
      "url": "http://127.0.0.1:8080/"
    }]
  },{
    "name": "BAAI/bge-large-en-v1.5",
    "displayName": "BAAI/bge-large-en-v1.5",
    "chunkCharLength": 512,
    "endpoints": [{
      "type": "tei",
      "url": "http://127.0.0.1:8080/"
    }]
  }
]`

OpenAI

It’s also possible to host your own OpenAI API compatible embedding models. Infinity is one example. You may run it locally with Docker:

docker run -it --gpus all -v infinity-data:/app/.cache -p 7997:7997 michaelf34/infinity:latest v2 --model-id nomic-ai/nomic-embed-text-v1 --port 7997

You can then hook this up to your Chat UI instance with the following configuration.

TEXT_EMBEDDING_MODELS=`[
  {
    "name": "nomic-ai/nomic-embed-text-v1",
    "displayName": "nomic-ai/nomic-embed-text-v1",
    "chunkCharLength": 512,
    "model": {
      "name": "nomic-ai/nomic-embed-text-v1"
    },
    "endpoints": [
      {
        "type": "openai",
        "url": "https://127.0.0.1:7997/embeddings"
      }
    ]
  }
]`

< > Update on GitHub