TheBloke
/

Falcon-7B-Instruct-GPTQ

@@ -13,7 +13,7 @@ inference: false
 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
-        <p><a href="https://discord.gg/Jq4vkcDakD">Chat & support: my new Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
         <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
@@ -27,15 +27,11 @@ This repo contains an experimantal GPTQ 4bit model for [Falcon-7B-Instruct](http
 It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
-## Need support? Want to discuss? I now have a Discord!
-Join me at: https://discord.gg/UBgz4VXf
-## EXPERIMENTAL
-Please note this is an experimental GPTQ model. Support for it is currently quite limited.
-It is also expected to be **SLOW**. This is currently unavoidable, but is being looked at.
 ## Prompt template
@@ -47,7 +43,7 @@ Assistant:
 ## AutoGPTQ
-AutoGPTQ is required: `pip install auto-gptq`
 AutoGPTQ provides pre-compiled wheels for Windows and Linux, with CUDA toolkit 11.7 or 11.8.
@@ -61,14 +57,6 @@ pip install .
 These manual steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
-## text-generation-webui
-There is provisional AutoGPTQ support in text-generation-webui.
-This requires text-generation-webui as of commit 204731952ae59d79ea3805a425c73dd171d943c3.
-So please first update text-genration-webui to the latest version.
 ## How to download and use this model in text-generation-webui
 1. Launch text-generation-webui
@@ -79,7 +67,7 @@ So please first update text-genration-webui to the latest version.
 6. Wait until it says it's finished downloading.
 7. Click the **Refresh** icon next to **Model** in the top left.
 8. In the **Model drop-down**: choose the model you just downloaded, `falcon-7B-instruct-GPTQ`.
-9. Make sure **Loader** is set to **AutoGPTQ**. This model will not work with ExLlama or GPTQ-for-LLaMa.
 10. Tick **Trust Remote Code**, followed by **Save Settings**
 11. Click **Reload**.
 12. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
@@ -96,7 +84,7 @@ In this repo you can see two `.py` files - these are the files that get executed
 To run this code you need to install AutoGPTQ and einops:
 ```
-pip install auto-gptq
 pip install einops
 ```
@@ -127,7 +115,7 @@ model = AutoGPTQForCausalLM.from_quantized(model_name_or_path,
 prompt = "Tell me about AI"
 prompt_template=f'''A helpful assistant who helps the user with any questions asked.
 User: {prompt}
-Assistant:''''
 print("\n\n*** Generate:")
@@ -175,7 +163,7 @@ It was created with groupsize 64 to give higher inference quality, and without `
 For further support, and discussions on these models and AI in general, join us at:
-[TheBloke AI's Discord server](https://discord.gg/Jq4vkcDakD)
 ## Thanks, and how to contribute.
@@ -190,9 +178,12 @@ Donaters will get priority support on any and all AI/LLM/model questions and req
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
-**Patreon special mentions**: Aemon Algiz, Dmitriy Samsonov, Nathan LeClaire, Trenton Dambrowitz, Mano Prime, David Flickinger, vamX, Nikolai Manek, senxiiz, Khalefa Al-Ahmad, Illia Dulskyi, Jonathan Leane, Talal Aujan, V. Lukas, Joseph William Delisle, Pyrater, Oscar Rangel, Lone Striker, Luke Pendergrass, Eugene Pentland, Sebastain Graf, Johann-Peter Hartman.
 Thank you to all my generous patrons and donaters!
 <!-- footer end -->
 # ✨ Original model card: Falcon-7B-Instruct

 </div>
 <div style="display: flex; justify-content: space-between; width: 100%;">
     <div style="display: flex; flex-direction: column; align-items: flex-start;">
+        <p><a href="https://discord.gg/theblokeai">Chat & support: my new Discord server</a></p>
     </div>
     <div style="display: flex; flex-direction: column; align-items: flex-end;">
         <p><a href="https://www.patreon.com/TheBlokeAI">Want to contribute? TheBloke's Patreon page</a></p>
 It is the result of quantising to 4bit using [AutoGPTQ](https://github.com/PanQiWei/AutoGPTQ).
+## PERFORMANCE
+Please note that performance with this GPTQ is currently very slow with AutoGPTQ.
+It may perform better with the latest GPTQ-for-LLaMa code, but I havne't tested that personally yet.
 ## Prompt template
 ## AutoGPTQ
+AutoGPTQ is required: `GITHUB_ACTIONS=true pip install auto-gptq`
 AutoGPTQ provides pre-compiled wheels for Windows and Linux, with CUDA toolkit 11.7 or 11.8.
 These manual steps will require that you have the [Nvidia CUDA toolkit](https://developer.nvidia.com/cuda-12-0-1-download-archive) installed.
 ## How to download and use this model in text-generation-webui
 1. Launch text-generation-webui
 6. Wait until it says it's finished downloading.
 7. Click the **Refresh** icon next to **Model** in the top left.
 8. In the **Model drop-down**: choose the model you just downloaded, `falcon-7B-instruct-GPTQ`.
+9. Set **Loader** to **AutoGPTQ**. This model will not work with ExLlama. It might work with recent GPTQ-for-LLaMa but I haven't tested that.
 10. Tick **Trust Remote Code**, followed by **Save Settings**
 11. Click **Reload**.
 12. Once it says it's loaded, click the **Text Generation tab** and enter a prompt!
 To run this code you need to install AutoGPTQ and einops:
 ```
+GITHUB_ACTIONS=true pip install auto-gptq
 pip install einops
 ```
 prompt = "Tell me about AI"
 prompt_template=f'''A helpful assistant who helps the user with any questions asked.
 User: {prompt}
+Assistant:'''
 print("\n\n*** Generate:")
 For further support, and discussions on these models and AI in general, join us at:
+[TheBloke AI's Discord server](https://discord.gg/theblokeai)
 ## Thanks, and how to contribute.
 * Patreon: https://patreon.com/TheBlokeAI
 * Ko-Fi: https://ko-fi.com/TheBlokeAI
+**Special thanks to**: Luke from CarbonQuill, Aemon Algiz.
+**Patreon special mentions**: RoA, Lone Striker, Gabriel Puliatti, Derek Yates, Randy H, Jonathan Leane, Eugene Pentland, Karl Bernard, Viktor Bowallius, senxiiz, Daniel P. Andersen, Pierre Kircher, Deep Realms, Cory Kujawski, Oscar Rangel, Fen Risland, Ajan Kanaga, LangChain4j, webtim, Nikolai Manek, Trenton Dambrowitz, Raven Klaugh, Kalila, Khalefa Al-Ahmad, Chris McCloskey, Luke @flexchar, Ai Maven, Dave, Asp the Wyvern, Sean Connelly, Imad Khwaja, Space Cruiser, Rainer Wilmers, subjectnull, Alps Aficionado, Willian Hasse, Fred von Graf, Artur Olbinski, Johann-Peter Hartmann, WelcomeToTheClub, Willem Michiel, Michael Levine, Iucharbius , Spiking Neurons AB, K, biorpg, John Villwock, Pyrater, Greatston Gnanesh, Mano Prime, Junyu Yang, Stephen Murray, John Detwiler, Luke Pendergrass, terasurfer , Pieter, zynix , Edmond Seymore, theTransient, Nathan LeClaire, vamX, Kevin Schuppel, Preetika Verma, ya boyyy, Alex , SuperWojo, Ghost , Joseph William Delisle, Matthew Berman, Talal Aujan, chris gileta, Illia Dulskyi.
 Thank you to all my generous patrons and donaters!
 <!-- footer end -->
 # ✨ Original model card: Falcon-7B-Instruct