Hugging Face x LangChain : A new partner package in LangChain
This article is also available in Chinese 简体中文.
We are thrilled to announce the launch of
langchain_huggingface
, a partner package in LangChain jointly maintained by Hugging Face and LangChain. This new Python package is designed to bring the power of the latest development of Hugging Face into LangChain and keep it up to date.
From the community, for the community
All Hugging Face-related classes in LangChain were coded by the community, and while we thrived on this, over time, some of them became deprecated because of the lack of an insider’s perspective.
By becoming a partner package, we aim to reduce the time it takes to bring new features available in the Hugging Face ecosystem to LangChain's users.
langchain-huggingface
integrates seamlessly with LangChain, providing an efficient and effective way to utilize Hugging Face models within the LangChain ecosystem. This partnership is not just about sharing technology but also about a joint commitment to maintain and continually improve this integration.
Getting Started
Getting started with langchain-huggingface
is straightforward. Here’s how you can install and begin using the package:
pip install langchain-huggingface
Now that the package is installed, let’s have a tour of what’s inside !
The LLMs
HuggingFacePipeline
Among transformers
, the Pipeline is the most versatile tool in the Hugging Face toolbox. LangChain being designed primarily to address RAG and Agent use cases, the scope of the pipeline here is reduced to the following text-centric tasks: “text-generation"
, “text2text-generation"
, “summarization”
, “translation”
.
Models can be loaded directly with the from_model_id
method:
from langchain_huggingface import HuggingFacePipeline
llm = HuggingFacePipeline.from_model_id(
model_id="microsoft/Phi-3-mini-4k-instruct",
task="text-generation",
pipeline_kwargs={
"max_new_tokens": 100,
"top_k": 50,
"temperature": 0.1,
},
)
llm.invoke("Hugging Face is")
Or you can also define the pipeline yourself before passing it to the class:
from transformers import AutoModelForCausalLM, AutoTokenizer,pipeline
model_id = "microsoft/Phi-3-mini-4k-instruct"
tokenizer = AutoTokenizer.from_pretrained(model_id)
model = AutoModelForCausalLM.from_pretrained(
model_id,
load_in_4bit=True,
#attn_implementation="flash_attention_2", # if you have an ampere GPU
)
pipe = pipeline("text-generation", model=model, tokenizer=tokenizer, max_new_tokens=100, top_k=50, temperature=0.1)
llm = HuggingFacePipeline(pipeline=pipe)
llm.invoke("Hugging Face is")
When using this class, the model will be loaded in cache and use your computer’s hardware; thus, you may be limited by the available resources on your computer.
HuggingFaceEndpoint
There are also two ways to use this class. You can specify the model with the repo_id
parameter. Those endpoints use the serverless API, which is particularly beneficial to people using pro accounts or enterprise hub. Still, regular users can already have access to a fair amount of request by connecting with their HF token in the environment where they are executing the code.
from langchain_huggingface import HuggingFaceEndpoint
llm = HuggingFaceEndpoint(
repo_id="meta-llama/Meta-Llama-3-8B-Instruct",
task="text-generation",
max_new_tokens=100,
do_sample=False,
)
llm.invoke("Hugging Face is")
llm = HuggingFaceEndpoint(
endpoint_url="<endpoint_url>",
task="text-generation",
max_new_tokens=1024,
do_sample=False,
)
llm.invoke("Hugging Face is")
Under the hood, this class uses the InferenceClient to be able to serve a wide variety of use-case, serverless API to deployed TGI instances.
ChatHuggingFace
Every model has its own special tokens with which it works best. Without those tokens added to your prompt, your model will greatly underperform
When going from a list of messages to a completion prompt, there is an attribute that exists in most LLM tokenizers called chat_template.
To learn more about chat_template in the different models, visit this space I made!
This class is wrapper around the other LLMs. It takes as input a list of messages an then creates the correct completion prompt by using the tokenizer.apply_chat_template
method.
from langchain_huggingface import ChatHuggingFace, HuggingFaceEndpoint
llm = HuggingFaceEndpoint(
endpoint_url="<endpoint_url>",
task="text-generation",
max_new_tokens=1024,
do_sample=False,
)
llm_engine_hf = ChatHuggingFace(llm=llm)
llm_engine_hf.invoke("Hugging Face is")
The code above is equivalent to :
# with mistralai/Mistral-7B-Instruct-v0.2
llm.invoke("<s>[INST] Hugging Face is [/INST]")
# with meta-llama/Meta-Llama-3-8B-Instruct
llm.invoke("""<|begin_of_text|><|start_header_id|>user<|end_header_id|>Hugging Face is<|eot_id|><|start_header_id|>assistant<|end_header_id|>""")
The Embeddings
Hugging Face is filled with very powerful embedding models than you can directly leverage in your pipeline.
First choose your model. One good resource for choosing an embedding model is the MTEB leaderboard.
HuggingFaceEmbeddings
This class uses sentence-transformers embeddings. It computes the embedding locally, hence using your computer resources.
from langchain_huggingface.embeddings import HuggingFaceEmbeddings
model_name = "mixedbread-ai/mxbai-embed-large-v1"
hf_embeddings = HuggingFaceEmbeddings(
model_name=model_name,
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)
HuggingFaceEndpointEmbeddings
HuggingFaceEndpointEmbeddings
is very similar to what HuggingFaceEndpoint
does for the LLM, in the sense that it also uses the InferenceClient under the hood to compute the embeddings.
It can be used with models on the hub, and TEI instances whether they are deployed locally or online.
from langchain_huggingface.embeddings import HuggingFaceEndpointEmbeddings
hf_embeddings = HuggingFaceEndpointEmbeddings(
model= "mixedbread-ai/mxbai-embed-large-v1",
task="feature-extraction",
huggingfacehub_api_token="<HF_TOKEN>",
)
texts = ["Hello, world!", "How are you?"]
hf_embeddings.embed_documents(texts)
Conclusion
We are committed to making langchain-huggingface
better by the day. We will be actively monitoring feedback and issues and working to address them as quickly as possible. We will also be adding new features and functionality and expanding the package to support an even wider range of the community's use cases. We strongly encourage you to try this package and to give your opinion, as it will pave the way for the package's future.