Clustering

#4
by ReggieMontoya - opened

There are thousands of artists in SDXL, and that number might be even higher in other models (both released and soon to come). Rather than knowing which artist you want, or trying to find a random one that works, how about clustering them together based on non-biased image analysis?

I am running a pilot on some of them I have already tested for SDXL awareness. If that goes well, I plan on doing a cladogram of base model artists with minimal prompt influence (no prompt, "a woman", "a man", "a scene").

Any interest in incorporating the results into your dataset? Happy to share the workflow in python and Comfyui.

For sure, I'm interested. Do you mean that you know of an image analysis technique that can determine relationships between artists based on their degree of similarity? I've never heard of anything like that.

Yeah, check out this article:
https://towardsdatascience.com/how-to-cluster-images-based-on-visual-similarity-cd6e7209fe34

I tested it last night and it seemed to work pretty well. There is a lot of tweaking to be done, for sure. I am currently running the 1070 artists you had in your database last time I pulled it, generating a collage of 4 tiled images for use in the clustering algorithm. Once that's done, I will run them all through together with both Kmeans and hierarchical clustering. That should generate a "family tree" of artists. E.g., the oil painters all clumping together, the Japanese woodblock artists, anime... etc.

I haven't seen anyone do this before. If it works, it could be hugely helpful as a resource. I'll let my computer burn up running this for a while and see if I can post some prelim results sometime soon.

Oh, and I have some other ideas on how to test the "strength" of an artist's weight in the models, too. I was thinking of generating some LORAs of varying strength that have a very overwhelming style (e.g. a black square on a white background) and seeing how strong the LORA needs to be before it overwhelms the artist's style in the prompt. Maybe put some feelers out on Reddit to see if people have interest and other ideas.

Very cool. The medium article makes it seem like the only result is that each image is labeled with a cluster. In that case it'll be easy to represent that in this app by just naming each cluster and adding tag to each artist.

But if the result is a similarity-score between images and/or a parent/child hierarchy of some kind, that would be extremely interesting. It'll take a lot of extra code to represent that in this app. But I imagine that an app user would first select an artist, then set a degree of "distance" slider, and then see all the artists artists within that distance of similarity or on the family tree.

I recommend using the images from the SDXL_1_0 folder. Those images were generated from the SDXL 1.0 vanilla model, so they are closest to the real artists' styles. The other 2 models' images are more aesthetic but have much less fidelity. It might also be best to use the portrait images. I would expect they are more likely to result in clusters based on artist style rather than on subject matter (e.g. this cluster has faces, while this had bridges). If it matters for speed/memory, you could probably reduce the images to 128x128 resolution or smaller and still have identifiable image features.

Once I figure out the methodology, then I will start looking into ways to describe it. It could be something as simple as having X number of artist families and randomly showing an image from that family, or as complicated as a browsable tree applet (the code for which surely must exist already, and if not, ChatGPT can help write).

I was thinking of trying to run it on the dataset if I can easily download it. If not, I do have the index images from a few similar sites downloaded.

However, I worry that a single outlier would totally tank the whole thing. Instead, I am trying first on a batch of 378 iterations of the following dynamic prompt:

(art style of {Leonid Afremov|Chiho Aoshima|Anna Dittmann|Pablo Picasso|Victo Ngai|an advertising photograph|Alphonse Mucha}:1.3), a {blue|green|red} {rose|mushroom|glass bottle of liquid|pile of sand|gemstone|plastic cube} {in an empty room|on a table|sitting in grass}

From this, I am generating 4 images. I'll then extract all the features and take the average of the 4 replicates, and cluster on that. That should get closer to the "average" of the style. I'll have to see if I can reconstruct an average image backwards from the imaging features.

Hopefully some time for this in the next day or two.
Example training images:
Chiho Aoshima green pile of sand in an empty room_0002.jpg
Chiho Aoshima green pile of sand in an empty room_0004.jpg
Chiho Aoshima green pile of sand in an empty room_0001.jpg
Chiho Aoshima green pile of sand in an empty room_0003.jpg

Chiho Aoshima green plastic cube in an empty room_0001.jpg
Chiho Aoshima green plastic cube in an empty room_0002.jpg
Chiho Aoshima green plastic cube in an empty room_0003.jpg
Chiho Aoshima green plastic cube in an empty room_0004.jpg

Chiho Aoshima green rose in an empty room_0001.jpg
Chiho Aoshima green rose in an empty room_0002.jpg
Chiho Aoshima green rose in an empty room_0003.jpg
Chiho Aoshima green rose in an empty room_0004.jpg

Progress: the clustering algorithm is working on average feature vectors of 4 images. The raw output correctly clusters the subject of the prompt, but that varies more than the style so the styles are often lumped together. This will be useful to know when considering what a raw artist's output is, without prompting on what the contents of the images should be. For example, landscape artists will tend to output landscapes, which will cluster together. Portraits together, abstracts together, etc.

Here are some examples from the above iteratable prompt, applied across all 1512 images:

hierarchy 2 cluster_1.jpg

hierarchy 2 cluster_18.jpg

hierarchy 2 cluster_13.jpg

Next, I will try clustering only on a single subject (varying color) across the small set of artists, then lastly on a single subject and color to isolate the differences in artists' styles.

That was quick. I can tell it largely ignores colors, rather it's analyzing other aspects of the image. I guess that's both good and bad - bad because color is an important part of an artist's style, but good because it's easy for us to prompt around and also easy for our stupid monkey brains to parse.

Here are 4 clusters of the results for rose. I think it nailed which are similar and which are outliers.
hierarchy 3 cluster_1.jpg
hierarchy 3 cluster_2.jpg
hierarchy 3 cluster_3.jpg
hierarchy 3 cluster_4.jpg

Not sure which is the easiest next step: generating 4 images per artist with a simple portrait prompt and run the whole thing on the average result, or attempt to do the single-image cluster on the images you already have generated. Both are going to be herculean computational tasks!

EDIT: maybe the easiest way to display this is to have the option to select an artist and get shown, in descending order, a list of every other artists' "distance" from them in the similarity matrix.

That way, the user can hop from artist to artist and find the islands of similarity themselves without having to do any complex datavis. The list of neighbors can include the images from that artist so you can see the visual similarity get farther and farther away as you scroll down.

Yeah, I think that representation of the data with the app will work well! That'll be an interesting way to explore, and it won't be too hard to implement. So the output of your computation will be edge weights between every artist and every other artist?

In another discussion thread, someone has suggested ~90 more artist names that they think SDXL knows. It's likely that I'll add many of those to the db at some point.

Right now, I have outputs of cluster memberships (the "correct" cutoff of how many clusters has been a tough nut to crack, unfortunately). I am 2/3 of the way through the training set so far and ran some prelim numbers through the system with promising results! Check this out, noting that sometimes the content of the image is more important than the style. Like in the last one... The hat cluster? I have to look into how I can assess certain "kinds" of features. Again, tough nut to crack.

Still is an amazing way to analyze the data, though! And nobody is doing this yet.

hierarchy 3 cluster_6.jpg

hierarchy 3 cluster_8.jpg

hierarchy 3 cluster_12.jpg

hierarchy 3 cluster_10.jpg

hierarchy 3 cluster_11.jpg

hierarchy 3 cluster_39.jpg

hierarchy 3 cluster_35.jpg

hierarchy 3 cluster_31.jpg

hierarchy 3 cluster_23.jpg

hierarchy 3 cluster_14.jpg

hierarchy 3 cluster_21.jpg

Wow, that's so cool! It's definitely finding stylistic similarities. Seems like color-palette has an influence as well. The two cartoonish clusters, one starting with Allie Brosh and another starting with Alex Toth, seem similar to me. E.g. dark thick lines and flat colors. If they were mixed together, I couldn't guess which belonged to which cluster. I wonder if the difference would be obvious if pointed out.

I added a feature today that sorts artists by similarity based purely on their tags. You pin an artist, then I sort based on each artist's Jaccard similarity coefficient with the pinned artists. This method is crude but useful.

If you end up completing your clustering approach and send me a matrix of scores, and can switch over, and the UI would work the same. Meanwhile, this feature might help you validate your clusters.

The Jaccard score ignores tag semantics, e.g. "dystopian" vs. "utopian" would ideally cause a lower score, but that's far beyond my capabilities.

There is value to both... similar tags is a good match in some ways, visual similarity is a good match in a different way. Both are useful.

Sign up or log in to comment