Spaces:

zjasper666
/

bf16_vs_fp8

Runtime error

App Files Files Community

bf16_vs_fp8 / docs /langchain_integration.md

zjasper666

Upload folder using huggingface_hub

8655a4b verified 3 months ago

preview code

raw

history blame

3.21 kB

	# Local LangChain with FastChat

	[LangChain](https://python.langchain.com/en/latest/index.html) is a library that facilitates the development of applications by leveraging large language models (LLMs) and enabling their composition with other sources of computation or knowledge.
	FastChat's OpenAI-compatible [API server](openai_api.md) enables using LangChain with open models seamlessly.

	## Launch RESTful API Server

	Here are the steps to launch a local OpenAI API server for LangChain.

	First, launch the controller

	```bash
	python3 -m fastchat.serve.controller
	```

	LangChain uses OpenAI model names by default, so we need to assign some faux OpenAI model names to our local model.
	Here, we use Vicuna as an example and use it for three endpoints: chat completion, completion, and embedding.
	`--model-path` can be a local folder or a Hugging Face repo name.
	See a full list of supported models [here](../README.md#supported-models).

	```bash
	python3 -m fastchat.serve.model_worker --model-names "gpt-3.5-turbo,text-davinci-003,text-embedding-ada-002" --model-path lmsys/vicuna-7b-v1.5
	```

	Finally, launch the RESTful API server

	```bash
	python3 -m fastchat.serve.openai_api_server --host localhost --port 8000
	```

	## Set OpenAI Environment

	You can set your environment with the following commands.

	Set OpenAI base url

	```bash
	export OPENAI_API_BASE=http://localhost:8000/v1
	```

	Set OpenAI API key

	```bash
	export OPENAI_API_KEY=EMPTY
	```

	If you meet the following OOM error while creating embeddings, please set a smaller batch size by using environment variables.

	~~~bash
	openai.error.APIError: Invalid response object from API: '{"object":"error","message":"NETWORK ERROR DUE TO HIGH TRAFFIC. PLEASE REGENERATE OR REFRESH THIS PAGE.\\n\\n(CUDA out of memory. Tried to allocate xxx MiB (GPU 0; xxx GiB total capacity; xxx GiB already allocated; xxx MiB free; xxx GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF)","code":50002}' (HTTP response code was 400)
	~~~

	You can try `export FASTCHAT_WORKER_API_EMBEDDING_BATCH_SIZE=1`.

	## Try local LangChain

	Here is a question answerting example.

	Download a text file.

	```bash
	wget https://raw.githubusercontent.com/hwchase17/langchain/v0.0.200/docs/modules/state_of_the_union.txt
	```

	Run LangChain.

	~~~py
	from langchain.chat_models import ChatOpenAI
	from langchain.document_loaders import TextLoader
	from langchain.embeddings import OpenAIEmbeddings
	from langchain.indexes import VectorstoreIndexCreator

	embedding = OpenAIEmbeddings(model="text-embedding-ada-002")
	loader = TextLoader("state_of_the_union.txt")
	index = VectorstoreIndexCreator(embedding=embedding).from_loaders([loader])
	llm = ChatOpenAI(model="gpt-3.5-turbo")

	questions = [
	"Who is the speaker",
	"What did the president say about Ketanji Brown Jackson",
	"What are the threats to America",
	"Who are mentioned in the speech",
	"Who is the vice president",
	"How many projects were announced",
	]

	for query in questions:
	print("Query:", query)
	print("Answer:", index.query(query, llm=llm))
	~~~