MacOS

#4
by gorovuha - opened

Hello!
I have a problem using mauve in jupyter lab with m1pro.

telegram-cloud-photo-size-2-5242633155800850156-y.jpg

After tokenizing the process freezes and nothing works. Jupyter shows that the kernel is busy, but due to a monitor there is no active process. It's even more curious that this cell doesn't stop after interrupting the kernel and only restarting kernel helps to unfreeze the window.
All the requirements are installed. It works very fast in google colab though.
Can you help with it? i'm out of ideas.

Also I have a question about the num_buckets. In Jupyter lab I constantly got warning mauve WARNING clustering 9678 points to 484 centroids: please provide at least 18876 training points until I set the num_buckets to the number equal to (number of rows)/39. Looks strange but it worked. On the contrary, in Google colab there is no warnings and everything goes smoothly inside of number of the rows (100 or 5000) doesn't matter. I'm curious about the impact of this parameter on the evaluations. Can you please provide more following information on this question?

In addition, I wonder if it's possible to change the model for feature extraction? I really appreciate your work and have read the papers on the topic, but there is rather confusing thing about it. Would it be adequate to evaluate generations provided by modern large llm with embeddings from gpt-2? Do you consider updating feature extraction method or allowing the customer to choose the model?

Sign up or log in to comment