tiiuae/falcon-40b · When query model for text generation I get this - The model 'RWForCausalLM' is not supported for text-generation.

May 27, 2023

I am using langchain to load falcon-40b on an H100 GPU machine but I get this and nothing is generated when I pass a context to it using FAISS

The model 'RWForCausalLM' is not supported for text-generation. Supported models are
 ['BartForCausalLM', 'BertLMHeadModel', 'BertGenerationDecoder', 'BigBirdForCausalLM', 'BigBirdPegasusForCausalLM', 'BioGptForCausalLM', 

'BlenderbotForCausalLM', 'BlenderbotSmallForCausalLM', 'BloomForCausalLM', 'CamembertForCausalLM', 'CodeGenForCausalLM', 'CpmAntForCausalLM', 

'CTRLLMHeadModel', 'Data2VecTextForCausalLM', 'ElectraForCausalLM', 'ErnieForCausalLM', 'GitForCausalLM', 'GPT2LMHeadModel', 'GPT2LMHeadModel', 'GPTBigCodeForCausalLM', 'GPTNeoForCausalLM', 'GPTNeoXForCausalLM', 'GPTNeoXJapaneseForCausalLM', 'GPTJForCausalLM', 'LlamaForCausalLM', 

'MarianForCausalLM', 'MBartForCausalLM', 'MegaForCausalLM', 'MegatronBertForCausalLM', 'MvpForCausalLM', 'OpenLlamaForCausalLM', 'OpenAIGPTLMHeadModel', 'OPTForCausalLM', 'PegasusForCausalLM', 'PLBartForCausalLM', 'ProphetNetForCausalLM', 'QDQBertLMHeadModel', 

'ReformerModelWithLMHead', 'RemBertForCausalLM', 'RobertaForCausalLM', 'RobertaPreLayerNormForCausalLM', 'RoCBertForCausalLM', 
'RoFormerForCausalLM', 'RwkvForCausalLM', 'Speech2Text2ForCausalLM', 'TransfoXLLMHeadModel', 'TrOCRForCausalLM', 'XGLMForCausalLM', 

'XLMWithLMHeadModel', 'XLMProphetNetForCausalLM', 'XLMRobertaForCausalLM', 'XLMRobertaXLForCausalLM', 'XLNetLMHeadModel', 'XmodForCausalLM'].

This is how I am loading the model and providing FAISS embeddings to it

def load_embeddings(sotre_name, path):
    with open(f"{path}/faiss_{sotre_name}.pkl", "rb") as f:
        VectorStore = pickle.load(f)
    return VectorStore

Embedding_store_path = f"./dbfs"


# hf_embed = load_embeddings(sotre_name='huggingface_fm_lambdalabs_faiss', 
hf_embed = load_embeddings(sotre_name='store_template', 
                                    path=Embedding_store_path)

def get_similar_docs(question, similar_doc_count):
  return hf_embed.similarity_search(question, k=similar_doc_count)

def build_qa_chain():
  torch.cuda.empty_cache()
  model_name = "tiiuae/falcon-40b"
 
  tokenizer = AutoTokenizer.from_pretrained(model_name)
  instruct_pipeline = pipeline(model=model_name, tokenizer=tokenizer, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", 
                               return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
 
  # Defining our prompt content.
  # langchain will load our similar documents as {context}
  template = """Below is an instruction that describes a task. Write a response that appropriately completes the request.
 
  Instruction: 
  You are an experienced in .. and your job is to help providing the best answer related to .... 
  Use only information in the following paragraphs to answer the question at the end. Explain the answer with reference to these paragraphs. If you don't know, say that you do not know.
 
  {context}
 
  Question: {question}
 
  Response:
  """
  prompt = PromptTemplate(input_variables=['context', 'question'], template=template)
 
  hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
  # Set verbose=True to see the full prompt:
  return load_qa_chain(llm=hf_pipe, chain_type="stuff", prompt=prompt, verbose=True)

qa_chain = build_qa_chain()

def answer_question(question):
  similar_docs = get_similar_docs(question, similar_doc_count=1)
  result = qa_chain({"input_documents": similar_docs, "question": question})
  
  print("question: " + question)
  print(" ")
  print("Answer: ")
  print(result['output_text'])
  print(" ")  
  print("Sources")
  print(" ")
  for d in result["input_documents"]:
    source_id = d.metadata["source"]
    print(d.page_content)
    print("Source " + source_id)
    print(" ")
    
answer_question("<question>?")
while True:
    query = input("\nEnter a query: ")
    if query == "exit":
        break

    # Get the answer from the chain
    answer_question(query)

airtable

May 27, 2023

This comment has been hidden

beothorn

May 27, 2023

I get the same error only by running the "How to Get Started with the Model"

bharven

May 27, 2023

+1 I also get this error

Minami-su

May 28, 2023

This comment has been hidden

Talha

May 28, 2023

me too

beothorn

May 28, 2023

@Seledorn :)

Minami-su

May 28, 2023

and this：
File ~/.cache/huggingface/modules/transformers_modules/falcon40b/modelling_RW.py:32, in Linear.forward(self, input)
31 def forward(self, input: torch.Tensor) -> torch.Tensor:
---> 32 ret = input @ self.weight.T
33 if self.bias is None:
34 return ret

RuntimeError: CUDA error: CUBLAS_STATUS_NOT_INITIALIZED when calling cublasCreate(handle)

WajihUllahBaig

May 28, 2023

This comment has been hidden

isrouush

May 28, 2023

same problem here!

manuelrech

May 29, 2023

same here with the how to get started
model: tiiuae/falcon-7b-instruct

FalconLLM

Technology Innovation Institute org May 30, 2023

Sorry about the delay, the The model 'RWForCausalLM' is not supported for text-generation comes from the model not being integrated into the core part of the transformers library yet. It's just a warning, and generation should follow afterwards. See for example: https://twitter.com/camenduru/status/1662225039352283137?s=20 of a video where it is working correctly.

It will take a little bit of time to integrate the model fully into the transformers library, but hopefully in a couple of weeks this warning will go away.

FalconLLM changed discussion status to closed May 30, 2023

airtable

May 30, 2023

@FalconLLM Thanks, Falcon-7B is generating data but I am unable to load Falcon-40B on a 1xNvidia H100 GPU with 80 VRAM, opening a separate issue

YannDubs

Jul 5, 2023

@FalconLLM any updates on this issue?

neuralworm

Jul 18, 2023

For me it was resolved with pip install git+https://github.com/huggingface/transformers

matheusalb

Aug 4, 2023

For me it was resolved with pip install git+https://github.com/huggingface/transformers

It worked for me as well. Thanks!!

audioscavenger

Aug 7, 2023

same here. thanks

For me it was resolved with pip install git+https://github.com/huggingface/transformers

Dehmax

Aug 8, 2023

For me it was resolved with pip install git+https://github.com/huggingface/transformers

Same for me. It also speed up inference drastically for the 7b-instruct model. Thanks a lot!

k3ybladewielder

Aug 14, 2023

Still getting this issue

amarahiqbal

Sep 28, 2023

Its not working for text generation. It says AttributeError: module transformers has no attribute RWForCausalLM