Request: Sonya-7B, samantha-mistral-7b-Mistral-7B-Instruct-v0.1, dolphin-2.6-mistral-7b (merge)
[Required] Model name:
Elly_7B
[Required] Model link: Merge (mix) of:1: https://huggingface.co/TheBloke/Xwin-LM-7B-V0.2-GGUF
1: Sonya-7B
2: https://huggingface.co/NeverSleep/Noromaid-7B-0.4-DPO-GGUF
2: samantha-mistral-7b-Mistral-7B-Instruct-v0.1
Option 3:https://huggingface.co/Lewdiculous/InfinityRP-v1-7B-GGUF-IQ-Imatrix
3: dolphin-2.6-mistral-7b
[Required] Brief description: Empathetic, romantic, coherent and eager to do. Excellent grammatical language skills.
[Required] An image/direct image link to represent the model (square shaped):
Image created by Claudio Arena with his personal AI model.
[Optional] Additonal quants (if you want any):
Default list of quants for reference:
"Q4_K_M", "Q4_K_S", "IQ4_XS", "Q5_K_M", "Q5_K_S",
"Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XXS"
P.S. I hope I wrote everything well
Greetings from Italy β€οΈ
π
Well this was just to request new Imatrix-Quants not really model merges to be fair, might need to add that information, but as for the merge request that's up for @Nitral-AI and @jeiku if they can spare the compute.
Will give a while for pondering.
I can look into this later after I finish up my current project, but I cannot merge GGUF models, I need fp16 or bf16.
@ClaudioItaly For merges it's important to link to the original F16 model weights like these:
We don't use the GGUFs for that.
@jeiku You are free to accept or decline any request and to work only on your own time as always.
My collab is currently free, i can spin this up now if you want to work on your projects @jeiku .
Yes please, bruv need to test these merge candidates for format accuracy while I upload another merge fodder and then I've got to double slerp everything. If you don't mind taking it, that would be a big help.
Wait, i only have the storage capable of doing two models at a clip, and cannot do 3 way dare-ties merges. (fuck it ill try anyways as a double dip slerp)
Wait, i only have the storage capable of doing two models at a clip, and cannot do 3 way dare-ties merges
Maybe they wanted #1 and #2 merged and #3 was an extra option?
@ClaudioItaly Please clarify, in detail, what you want done and what result do you expect.
Clarification would certainly be appreciated, also given the models in hand. I would recommend the config from infinityRP as the base model. Both noromaid and xwin have added tokens which can cause weird behavior between merges. (Will be waiting for more information before i proceed any further.)
Clarification would certainly be appreciated, also given the models in hand. I would recommend the config from infinityRP as the base model. Both noromaid and xwin have added tokens which can cause weird behavior between merges. (Will be waiting for more information before i proceed any further.)
Sorry, there's a time difference between Italy and you :-) Yes, 3 could have been an option. If it worked, it was excellent, if not, it's fine.ππ
Looks like InfinityRP as a base again.
Looks like InfinityRP as a base again.
I hope it turns out well because Xwin-LM-7B-V0.2 has a nice conversation and narration language
Question. After using the MergeKit what do you always do to convert the safetensors to GUFF?
Clone the llama.cpp repository. Use python convert.py
to convert to GGUF F16, then use the GGUF-F16 to generate the imatrix calibration data, using the imatrix.exe
prebuilt from the latest repository release. Use the quantize.exe
from the same latest binaries to quantize the F16 GGUF to the quants you want, applying the imatrix.dat
data you created from the F16 model.
As soon as the model is ready will I see it in the list?
@ClaudioItaly I believe either @Nitral-AI or @jeiku will update with a link here once it is merged, once merged it can be Quantized as usual.
Keep getting pickling errors while running the first merge. (might be a bit while i figure this out.)
Maybe it's because Xwin-LM/Xwin-LM-7B-V0.2 is a llama model and NeverSleep/Noromaid-7B-0.4-DPO is a mistral model.
I think you can still do it with the allow-crimes flag, but it is experimental.
Does not work, tried it in every measure. Does not like the tensor conflicts, and causes massive pickling issues.
Ive sunk a few hours into trying to make it work, but to no avail.
I'm trying with --allow-crimes now in a 3 way dareties but I am not hopeful that it will work. @ClaudioItaly could you please research your models more thoroughly next time, @Nitral-AI has lost hours trying to make this work. This could have been prevented if you were more cognizant of which models can be merged.
@jeiku Should this be unfeasible or just too difficult/unstable as it has shown this can be closed to avoid more ineffective time spending in an incompatible merge.
@jeiku Should this be unfeasible or just too difficult/unstable as it has shown this can be closed to avoid more ineffective time spending in an incompatible merge.
I have a complete merge and am downloading to test now. According to the merge logs Xwin was pretty much left out of the mix, but it may still be broken if the lm.head was included as the logs imply.
I'm trying with --allow-crimes now in a 3 way dareties but I am not hopeful that it will work. @ClaudioItaly could you please research your models more thoroughly next time, @Nitral-AI has lost hours trying to make this work. This could have been prevented if you were more cognizant of which models can be merged.
I'm sorry, I didn't know there could be problems and I didn't know that I had to choose only between your models. I am sorry. If you don't succeed it doesn't matter, I thank you anyway. Elly's photo will remain as a gift
@ClaudioItaly It's not about choosing our models. Xwin is a llama2 model. And the other models you requested are mistral based. They are two different architectures of model.
I'm sorry, I didn't know there could be problems and I didn't know that I had to choose only between your models. I am sorry. If you don't succeed it doesn't matter, I thank you anyway. Elly's photo will remain as a gift
Not between our models, you have to choose between either Mistral or Llama2, Xwin is a Llama2 model, the other two are Mistral, furthermore, Xwin is a 32bit model, the other two are 16bit models. These were the sources of the problems.
I'm sorry, I didn't know there could be problems and I didn't know that I had to choose only between your models. I am sorry. If you don't succeed it doesn't matter, I thank you anyway. Elly's photo will remain as a gift
Not between our models, you have to choose between either Mistral or Llama2, Xwin is a Llama2 model, the other two are Mistral, furthermore, Xwin is a 32bit model, the other two are 16bit models. These were the sources of the problems.
But how do I know if the model is llma 2 or Mistral or 32 0 16? It is often not said on the pages and it is not even written in the model name
But how do I know if the model is llma 2 or Mistral or 32 0 16? It is often not said on the pages and it is not even written in the model name
You can check the model configuration files to verify the architecture type.
This is what typical output looks like from the merge you requested. I was able to successfully merge the files, but because of the architecture difference, the model does not have a working vocabulary. I am not uploading this model as it would be a waste of bandwidth.
"Tensors experiencing fatal aneurism's caught in 1080p."
If the model offer is still valid I will look for two llamas to make the 7B Elly model
Id recommend mistral over llama2.
If the model offer is still valid I will look for two llamas to make the 7B Elly model
The better decision would be to find a Mistral model to replace Xwin, as the other two will merge just fine on their own. Also, Mistral has been the preferred architecture for 7B for several months, you will have to use outdated models if you go with Llama2.
Id recommend mistral over llama2.
Last thing, should I open a new thread?
Id recommend mistral over llama2.
Last thing, should I open a new thread?
I would keep it here. Remember, Noromaid and InfinityRP are both ready to merge right now, if you want, I could merge those two for you in a few minutes. It is not necessary to find all new models. But if you want to add a new model or two that's fine. I can handle up to 4 in a single merge.
Last thing, should I open a new thread?
Just update participants here once you get your 3 7B Mistral based models selected. Elly is fine.
@ClaudioItaly Will you need me to upload quants of the final model or will you be making them for yourself?
Last thing, should I open a new thread?
Just update participants here once you get your 3 7B Mistral based models selected. Elly is fine.
1 https://huggingface.co/Lewdiculous/Sonya-7B-GGUF-IQ-Imatrix
2 https://huggingface.co/TheBloke/dolphin-2.6-mistral-7B-GGUF
3 https://huggingface.co/MaziyarPanahi/samantha-mistral-7b-Mistral-7B-Instruct-v0.1-GGUF
if I have to be honest, at a conversational level I prefer the LLamas, they are more gentle and less academic.
However, I really like the models you are creating, Imatrix.
For quantitative data I leave the choice up to you π€π
Last thing, should I open a new thread?
Just update participants here once you get your 3 7B Mistral based models selected. Elly is fine.
1 https://huggingface.co/Lewdiculous/Sonya-7B-GGUF-IQ-Imatrix
2 https://huggingface.co/TheBloke/dolphin-2.6-mistral-7B-GGUF
3 https://huggingface.co/MaziyarPanahi/samantha-mistral-7b-Mistral-7B-Instruct-v0.1-GGUFif I have to be honest, at a conversational level I prefer the LLamas, they are more gentle and less academic.
However, I really like the models you are creating, Imatrix.
For quantitative data I leave the choice up to you π€π
@ClaudioItaly These are GGUF files, we require the fp16 files. Also, are you sure, this is the model you want? You can still use Noromaid or InfinityRP.
Last thing, should I open a new thread?
Just update participants here once you get your 3 7B Mistral based models selected. Elly is fine.
1 https://huggingface.co/Lewdiculous/Sonya-7B-GGUF-IQ-Imatrix
2 https://huggingface.co/TheBloke/dolphin-2.6-mistral-7B-GGUF
3 https://huggingface.co/MaziyarPanahi/samantha-mistral-7b-Mistral-7B-Instruct-v0.1-GGUFif I have to be honest, at a conversational level I prefer the LLamas, they are more gentle and less academic.
However, I really like the models you are creating, Imatrix.
For quantitative data I leave the choice up to you π€π@ClaudioItaly These are GGUF files, we require the fp16 files. Also, are you sure, this is the model you want? You can still use Noromaid or InfinityRP.
I'm looking for the best solution to have something similar to this model. https://huggingface.co/Undi95/Utopia-13B You should try it. He has a way of speaking and writing that no model has. and I assure you that it is not just a question of Prompt because with the same prompt that I use other "modern" models make a bad impression.
try it
@ClaudioItaly You still have not provided me with FP16 files, but in the interest of wrapping this up I will source them myself, but please learn the difference between FP16 and GGUF.
@ClaudioItaly You still have not provided me with FP16 files, but in the interest of wrapping this up I will source them myself, but please learn the difference between FP16 and GGUF.
πͺSorry, it's 3 in the morning here
https://www.phind.com/agent?cache=cltxq8xjl0011l5087fmkbofy
Something like this.
For example, these are the original weights, what we need, looking at the files you'll see .safetensors and .bin usually:
https://huggingface.co/SanjiWatsuki/Sonya-7B
See how in Tensortype it says BF16? That's it.
This is an Quant(ization) of that same model, we don't want that:
https://huggingface.co/Lewdiculous/Sonya-7B-GGUF-IQ-Imatrix
In the Quants, for GGUF you'll see something like this.
In short, always look for the actual model page. Use the search bar.
I should upload Quants tomorrow. You may rest.
@Lewdiculous Performing the merge now, then testing, then I'll upload. I think in the future, should these requests come in, we need to inform the users of the basics of merging, since there seems to be some disconnect. With that said, both Nitral and I would rather not advertise these services. I agreed to this one naively and assumed it would be quick and easy.
I agree and understand completely, this was only meant for my quants so I'll clarify that in the README.md to ensure this isn't part of it.
Sorry -- @Nitral-AI , @jeiku -- for any hardship that the exposure might have caused.
@Lewdiculous Here it is: https://huggingface.co/jeiku/Elly_7B
@jeiku Will see to it after the massive gemma quants :')
Expect tomorrow, @ClaudioItaly
Thanks great job very good. As soon as it is converted to guff I will try it immediately
Quants uploaded.
@ClaudioItaly Uploaded at:
Hi, thank you very much. Model Elly has excellent reasoning skills and is very obedient. She can distinguish grammar and verbs well, and is well informed about history. And he knows how to write well-composed lyrics,
this is an example:
Question: Hi, let's make a text of this scene again with your beautiful poetic writing: A lonely woman goes for a walk on the Ligurian hills
Answer:
In the heart of the horizon, the Ligurian hills rise like the teeth of an immense print, adorned with plastered greens and reds.
A lone woman climbs a path on one of the slopes. The sunlight becomes softer, almost like a veil that has lost some of its features, making the panorama a little blurry.
On her shoulders, she carries an olive green jacket and in her right hand she tightly holds a brown bag. The head is tilted upwards, with closed eyes enjoying the warm sun and the gentle wind on the skin.
In a rhythmic cadence, she begins to move her feet on that dry, grassy grass, hearing the clicks of flowers beneath her steps. For a few moments, life seems underlined by the sound of bees flying around her, and the chimes of a distant bell tower.
Thoughts are like little clouds moving in all directions; but at some point, she has a feeling of tranquility within herself, an image of something greater than the physical world. Yes, there is a moment in which the soul collides with the Divine and, for an instant, the two entities merge as one.
In the very heart of the solitary woman, a profound respect for nature and its grandiose powers takes shape. The oldest trees and the flowers that have lived the longest turn to her with a mysterious love.
As she continues her journey, she begins to observe the appearance of things around her. A whitish stone, perfectly round, lies on the road under his feet; a white insect flies in front of the woman and then disappears into the blue sky.
At this point, the solitary woman has a vision: the inner and outer worlds blend together, creating a world of images, sounds, as well as colors and sensations.
She continues climbing for another few minutes, until she reaches a small dry stone hut. Her eyes approach the portal, and with a deep breath, she enters the place of peace.
Inside that simple but forceful room, the solitary woman finds herself and begins to speak high up, with a transparent voice that mixes with the wind and the notes of the birds.
Coming out of the hut after a few minutes, the woman has even more strength in her soul. From now on, all her steps are accompanied by the feeling of being united with the entire universe.
Request;
What can you tell me about Quantized AI models that are made in open source like:
quantization_options = [
"Q4_K_M", "Q4_K_S", "IQ4_XS", "Q5_K_M", "Q5_K_S",
"Q6_K", "Q8_0", "IQ3_M", "IQ3_S", "IQ3_XXS"
]
answer:
Quantized AIs, also known as quantized neural models, are a type of neural models in which numerical variables are quantified in discrete intervals rather than continuously.
The quantization process can be optimized in several ways, including relying on the use of fixed bit values ββor dynamic coding. When a fixed number of bits is used for each parameter, it is called "fixed quantization", while when a variable number of bits is used depending on the needs, it is called "dynamic coding".
The quantization process can be applied to the weight of connections in the model, the activation of each unit in the path of the input through the layers of the neural network, or the final combination of the output signal. Typically, quantization is used to improve the inference speed and energy efficiency of models, making them more practical for mobile devices and onboard systems.
The list you provided shows some examples of quantization options commonly used in neural networks. Q4_K_M, for example, means 4-bit quantization with a dynamic encoding (K) and a larger broadening factor (M). The meaning of the symbols used in these options has been defined by the IEEE and can be found in their standard definition.
These quantization options are generally available as part of the open-source TensorFlow library, which is one of the leading platforms for building and training neural models. In your list, the options "Q4_K_M", "Q4_K_S", and "IQ4_XS" are mainly used in models that are based on int32 or int16 type tensors. The other models can be used with different data types, such as int8, uint8, qint8 and quint8.
It is important to take into account that when quantizing a neural model, its ability to generalize may be diminished. This is because the quantization process can introduce errors that can negatively impact model performance. However, recent advances in quantization research have allowed us to obtain better results than in the early years of development of quantized models, although there are still many open fields for research in this direction.
Request:
Question: If I have 64 RAM CPU with AMD Ryzen 5 2400G processor and 8 GB Nividia RTX 3050 GPU, what quantification do you recommend?
Answer
Regarding the hardware you described, I would suggest using the Q4_K_M model to quantize the weights of the connections in the neural network and IQ3_XXS or IQ3_XXS for the quantization of the activations.
Thanks to your AMD Ryzen 5 2400G CPU and NVIDIA RTX 3050 GPU with 8 GB of memory, this hardware can support the execution of lower-bit quantized neural models than other lower-performance systems.
Additionally, the processor and GPU available on your system support hardware acceleration for graphics calculations, which can lead to further performance improvement of quantized models in terms of speed and energy efficiency.
For this reason, the Q4_K_M option combines 4-bit quantization with dynamic encoding and a larger broadening factor, which can further improve model performance.
For trigger quantization, the IQ3_XXS or IQ3_XXS options offer dynamic encoding with a minimal broadening factor, which can improve the energy efficiency of models and reduce their memory impact.
However, if you find yourself unable to tackle multiple computations, you can also attempt using the Q6_K or Q8_0 model for weight quantization, which will offer greater accuracy and performance improvement.
In any case, the optimization process for quantization is a very complex subject and requires an iterative approach, which may also include some fine-tuning steps of the model after quantization. For this reason, you can try using a tool like TensorFlow, which provides open source libraries for quantizing and developing neural models.
β€οΈπ