When query model for text generation I get this - The model 'RWForCausalLM' is not supported for text-generation.
I am using langchain to load falcon-40b on an H100 GPU machine but I get this and nothing is generated when I pass a context to it using FAISS
The model 'RWForCausalLM' is not supported for text-generation. Supported models are
['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM',
'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM',
'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM',
'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel',
'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM',
'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM',
'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].
This is how I am loading the model and providing FAISS
embeddings to it
def load_embeddings(sotre_name, path):
with open(f"{path}/faiss_{sotre_name}.pkl", "rb") as f:
VectorStore = pickle.load(f)
return VectorStore
Embedding_store_path = f"./dbfs"
# hf_embed = load_embeddings(sotre_name='huggingface_fm_lambdalabs_faiss',
hf_embed = load_embeddings(sotre_name='store_template',
path=Embedding_store_path)
def get_similar_docs(question, similar_doc_count):
return hf_embed.similarity_search(question, k=similar_doc_count)
def build_qa_chain():
torch.cuda.empty_cache()
model_name = "tiiuae/falcon-40b"
tokenizer = AutoTokenizer.from_pretrained(model_name)
instruct_pipeline = pipeline(model=model_name, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",
return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
# Defining our prompt content.
# langchain will load our similar documents as {context}
template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
Instruction:
You are an experienced in .. and your job is to help providing the best answer related to ....
Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.
{context}
Question: {question}
Response:
"""
prompt = PromptTemplate(input_variables=['context', 'question'], template=template)
hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
# Set verbose=True to see the full prompt:
return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)
qa_chain = build_qa_chain()
def answer_question(question):
similar_docs = get_similar_docs(question, similar_doc_count=1)
result = qa_chain({"input_documents": similar_docs, "question": question})
print("question: " + question)
print(" ")
print("Answer: ")
print(result['output_text'])
print(" ")
print("Sources")
print(" ")
for d in result["input_documents"]:
source_id = d.metadata["source"]
print(d.page_content)
print("Source " + source_id)
print(" ")
answer_question("<question>?")
while True:
query = input("\nEnter a query: ")
if query == "exit":
break
# Get the answer from the chain
answer_question(query)
I get the same error only by running the "How to Get Started with the Model"
+1 I also get this error
me too
@Seledorn :)
and this:
File ~/.cache/huggingface/modules/transformers_modules/falcon40b/modelling_RW.py:32, in Linear.forward(self, input)
31 def forward(self, input: torch.Tensor) -> torch.Tensor:
---> 32 ret = input @ self.weight.T
33 if self.bias is None:
34 return ret
RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)
same problem here!
same here with the how to get started
model: tiiuae/falcon-7b-instruct
Sorry about the delay, the The model 'RWForCausalLM' is not supported for text-generation
comes from the model not being integrated into the core part of the transformers library yet. It's just a warning, and generation should follow afterwards. See for example: https://twitter.com/camenduru/status/1662225039352283137?s=20 of a video where it is working correctly.
It will take a little bit of time to integrate the model fully into the transformers library, but hopefully in a couple of weeks this warning will go away.
@FalconLLM Thanks, Falcon-7B is generating data but I am unable to load Falcon-40B on a 1xNvidia H100 GPU with 80 VRAM, opening a separate issue
@FalconLLM any updates on this issue?
For me it was resolved with
pip install git+https://github.com/huggingface/transformers
It worked for me as well. Thanks!!
same here. thanks
For me it was resolved with
pip install git+https://github.com/huggingface/transformers
For me it was resolved with
pip install git+https://github.com/huggingface/transformers
Same for me. It also speed up inference drastically for the 7b-instruct model. Thanks a lot!
Still getting this issue
Its not working for text generation. It says AttributeError: module transformers has no attribute RWForCausalLM