VLM Visual Arts Analysis with DeepSeek Janus-1.3B
Introduction
The original plan for this analysis with DeepSeek Janus-1.3B was to process images in the same conditions previously used for the text-in-images study with Janus-1.3B, and before that for the image and OCR analysis with Microsoft Florence-2-base and Qwen2-VL-2B. But an issue to make the model function properly on Google Colab just as I did for the other studies has led me to find another way to process the images with Janus-1.3B.
Thus, this study was made with the Online Demo of Janus-1.3B, available on Hugging Face, which has allowed me to assess the model abilities to analyze images after all.
The corpus of images is the same that has already been processed and analyzed with Florence-2-base and Qwen2-VL-2B. The aim is to see how Janus-1.3B will be able to analyze works of art. The conclusions we had reached in our previous study with Janus-1.3B had shown that while this model was able to deliver a brief convincing description (and this especially when processing aesthetical elements), it did have issues to explain/understand the content. It seems Janus-1.3B was not able to base its interpretation of the text on its own previous knowledge. Since the conditions and method of this study are not exactly the same than in the previous ones made with Florence-2-base and Qwen2-VL-2B, the goal is not as much to draw a comparison between them but especially to see Janus-1.3B abilities at analyzing images and detect objects, when used on the Hugging Face demo.
Analysis of Raphael’s The School of Athens, 1509-1511, Apostolic Palace, Vatican City
The first image processed and analyzed on the Hugging Face demo is Raphael’s The School of Athens. The results of the analysis by Janus-1.3B of the Renaissance fresco were overall much more accurate and complete than for our OCR analysis.
The first "question" given to Janus-1.3B on the Hugging Face demo was to "Explain the image". This led to a long and highly detailed description of the Raphael's artwork. The decorative elements were the main focus of the model's analysis, something that we also noticed in the results delivered by Florence-2-base (and on some level for Qwen2-VL-2B as well). The model, as used on this demo, organized several categories of elements (architectural, human, decorative) and also delivered an analysis of the color palette and gave an interpretation. Here are the full results:
"The image depicts a detailed fresco painting, likely from the Renaissance period, showcasing a grand architectural interior with numerous figures engaged in various activities. The fresco is rich in detail and features a mix of human figures, architectural elements, and decorative motifs.
Architectural Elements:
- Arches and Columns: The fresco features a series of arches and columns that create a sense of depth and grandeur. The arches are adorned with intricate carvings and moldings, adding to the overall opulence of the scene.
- Windows: There are several windows visible in the background, allowing natural light to illuminate the interior and highlight the architectural details.
Human Figures:
- Diverse Activities: The fresco portrays a variety of human activities, including:
- Social Interaction: Figures are engaged in conversations, gestures, and interactions, suggesting a social gathering or a public event.
- Reading and Writing: Some figures appear to be reading or writing, indicating a scholarly or intellectual setting.
- Rituals and Ceremonies: There are depictions of rituals and ceremonies, such as a figure in a blue robe holding a book, possibly a scholar or a priest.
Decorative Elements:
- Sculptures: The fresco includes several sculptures, which are integrated into the architectural design. These sculptures add to the richness and detail of the scene.
- Floral and Geometric Patterns: The fresco features intricate floral and geometric patterns, which are typical of Renaissance art and add to the overall aesthetic appeal.
Color Palette:
- Warm and Cool Tones: The color palette is predominantly warm, with shades of orange, red, and gold, which are often used in Renaissance art to evoke a sense of warmth and richness.
- Neutral Background: The background is a light, neutral color, which helps to highlight the figures and architectural elements.
Interpretation:
The fresco likely represents a scene from classical mythology or a historical event, given the presence of figures in robes and the architectural setting. The detailed and intricate nature of the fresco suggests it could be a work by a renowned artist of the Renaissance period, such as Michelangelo or Leonardo da Vinci.
Conclusion:
This fresco painting is a masterpiece of Renaissance art, showcasing the skill and creativity of the artist. The detailed depiction of human figures, architectural elements, and decorative motifs all contribute to the overall grandeur and beauty of the scene. The fresco likely serves".
As we can see, Janus-1.3B master the English language, along with common expressions used in art history descriptions (for example, "The color palette is predominantly warm, with shades of orange, red, and gold, which are often used in Renaissance art to evoke a sense of warmth and richness"). The model accurately situates the artwork in the Renaissance period. It gives interesting interpretations of the aesthetical elements. As in our previous study of The School of Athens by Florence-2-base and Qwen2-VL-2B, it is interesting the model attempts to find the activity the characters are doing in the artwork but does not succeed to link it with philosophy. Similar to the results produced by Qwen2-VL-2B, it indicates its "Figures are engaged in conversations, gestures, and interactions, suggesting a social gathering or a public event [...] Some figures appear to be reading or writing, indicating a scholarly or intellectual setting [...] There are depictions of rituals and ceremonies,"
Janus-1.3B tries to identify the name of the artist, which it correctly indicates as a "renowned artist of the Renaissance period". The options given, "such as Michelangelo or Leonard da Vinci" are not correct, since the fresco was made by Raphael. But it still underlines the model has good abilities to recognize the artistic style belonging to a movement.
In a second attempt to analyze The School of Athens with Janus-1.3B, the "question" asked was: "Describe the image".
Here is the result: "The fresco is divided into several sections, each containing different groups of people. The central part of the fresco shows a group of individuals gathered around a central figure, who is likely the focal point of the scene. This central figure is dressed in a blue robe and is surrounded by others who are either standing, sitting, or kneeling. The expressions and postures of the figures suggest a moment of significant interaction or debate.
To the left and right of the central group, there are additional scenes. On the left, a group of figures is engaged in what appears to be a discussion or argument. Some figures are gesturing emphatically, while others are listening intently. The attire of the figures varies, with some wearing robes and others in more elaborate garments.
On the right side of the fresco, there is another group of figures, some of whom are seated and others standing. This group seems to be in a more contemplative or solemn mood, with some figures appearing to be in deep thought or prayer.
The architectural elements of the fresco include columns, arches, and decorative moldings. The arches are particularly notable, with detailed carvings and patterns that add to the overall grandeur of the scene. The floor is tiled with geometric patterns, and there are several alcoves and niches that contain additional figures or decorative elements.
The fresco is rendered in a classical style, with a focus on human emotion and interaction. The use of light and shadow is also noteworthy, creating a sense of depth and dimension within the scene. The overall composition is balanced, with a clear division between the central group and the surrounding scenes.
Given the detailed and intricate nature of the fresco, it is likely that this artwork is part of a larger historical or cultural context, possibly a religious or historical narrative. The presence of multiple human figures and the elaborate architectural setting suggest that this fresco may be a depiction of a significant event or a scene from a historical narrative.
In summary, the image is a detailed fresco painting depicting a grand hall or courtyard with a central group of figures engaged in a significant interaction. The scene is rich in human emotion and architectural detail, with a focus on classical art style and a historical or cultural context".
As we can see in this long detailed description, Janus-1.3B puts the main focus on the decorative elements once again. In this case, it does not attempt to identify the author or the artistic movement to which this fresco belongs. The description is accurate and does not show misinterpretations or misunderstandings. The model analyzes each part of the painting, from the left to the right. This result, even if it does not link the artwork with a particular era or artist, is very successful.
The last question asked to Janus-1.3B was to "Identify the author of this fresco", in order to see the answer it would give when pushed to find the artist. The result was: "The author of this fresco is Michelangelo", which is incorrect just as in the previous result but shows the model accurately relates this artwork to the Renaissance artistic period.
Analysis of The Starry Night by Vincent Van Gogh, 1889, MoMa, New York
The second artwork analyzed with Janus-1.3B on the Hugging Face demo is one of the most famous Post-Impressionist painting: in this case, The Starry Night by Vincent Van Gogh. This painting had been processed with Qwen2-VL-2B which had successfully identified both the name of the artwork and its artist.
The first instruction given to Janus-1.3B was to "Explain the painting".
The result was the following: "The painting is a reproduction of Vincent van Gogh's famous "Starry Night." It depicts a night sky filled with swirling clouds and stars, with a prominent cypress tree in the foreground. The village below is depicted with a mix of houses and a church, and the overall color palette is dominated by shades of blue and green. The painting is a classic example of Van Gogh's unique style, characterized by swirling patterns and vibrant colors".
The description for the Vincent Van Gogh painting is much shorter than for the fresco, which can be also explained by the fact there are less elements to analyze. Interestingly enough, Janus-1.3B has correctly identified the name of the artist as well as the artwork. This shows VLM models in general have a much better understanding with images that are very well-known and more easily identifiable. It seems that when it comes to art and images in general, Janus-1.3B has a much larger previous knowledge, to interpretate the content - as opposed to text-in-images analysis where the results were much less convincing. The model even delivers an interpretation of the artwork that is perfectly worded just like in art history books: "The painting is a classic example of Van Gogh's unique style, characterized by swirling patterns and vibrant colors".
The second instruction given to Janus-1.3B was to "Describe the image".
This question led to a much longer description:
"The image is a reproduction of Vincent van Gogh's famous painting "Starry Night." This iconic artwork is characterized by swirling, dynamic brushstrokes and vivid, contrasting colors that create a sense of movement and energy. The scene depicted in the painting is a nighttime landscape with a village in the foreground and a swirling sky filled with stars and celestial bodies.
Detailed Description:
Sky and Celestial Bodies:
- The sky is a deep blue, with swirling patterns of white and yellow that represent stars and clouds.
- There are several prominent stars, each with a bright yellow center and surrounded by white and blue strokes.
- A large, bright yellow sun dominates the upper right quadrant of the sky, with smaller stars and swirls around it.
- The sky is filled with dynamic, expressive brushstrokes that convey a sense of motion and depth.
Village:
- The village is depicted in the lower part of the image, with buildings and houses that have a somewhat stylized, almost cartoonish appearance.
- The rooftops are rendered in shades of blue and gray, with some rooftops having a reddish hue.
- The village is surrounded by rolling hills and a few trees, which are painted in a similar style to the rest of the landscape.
Trees and Landscape:
- There are two prominent trees in the foreground, one on the left side and one on the right.
- The trees are depicted with thick, dark green trunks and branches that twist and turn, adding to the dynamic nature of the painting.
- The landscape behind the village is composed of rolling hills and mountains, painted in various shades of blue and green.
Analysis:
The painting captures the essence of Van Gogh's style, characterized by its use of bold, expressive brushstrokes and vivid, contrasting colors. The swirling sky and celestial bodies are reminiscent of Van Gogh's famous "Starry Night," which is a quintessential representation of his post-Impressionist style. The village below is a simplified, stylized version of Van Gogh's own village, which is often seen as a symbol of the artist's connection to his homeland".
Once again, Janus-1.3b has successfully identified the artist and the name of the painting. The description it gives is detailed with a major focus on the color palette and with a categorization of the elements analyzed (between "Sky and celestial bodies", "Village" and "Trees and Landscape", offering different levels of interpretation). The analysis is accurate, saying that the "captures the essence of Van Gogh's style, characterized by its use of bold, expressive brushstrokes and vivid, contrasting colors". The interpretation of the village represented, as being a "a simplified, stylized version of Van Gogh's own village, which is often seen as a symbol of the artist's connection to his homeland" is not exactly accurate as the village is the view of Saint-Rémy-de-Provence from Van Gogh's room and not supposed to represent the artist's homeland (which would be in the Netherlands). It's interesting nonethless to assess Janus-1.3B abilities at interpretating images. The results were in this case much more successful than when we studied elements of text in a previous study with Janus-1.3B.
The last question asked to the model was "What is the meaning of this image", to push it to give another interpretation.
The answer delivered by Janus-1.3B was the following: "The image is a reproduction of Vincent van Gogh's famous painting "Starry Night." This painting is characterized by swirling, dynamic patterns of blue and yellow hues, which evoke a sense of movement and energy. The scene depicts a night sky filled with stars and a large, swirling moon. The landscape below features a village with houses and a church, all rendered in a similar style of Van Gogh's signature brushwork. The overall composition is balanced, with the tall cypress tree on the left and the village on the right, creating a sense of depth and perspective. The use of color and texture adds to the painting's emotional impact, making it a classic representation of the post-impressionist movement".
The description of the painting is once again accurate. Still mostly based on aesthetical features, it gives a brief yet correct analysis of the painting. Even if the question asked for the meaning of this image, there is not a real answer to this instruction. The part that shows Janus-1.3B's own interpretation the most is the last sentence: "The use of color and texture adds to the painting's emotional impact, making it a classic representation of the post-impressionist movement".
If the textual analysis by Janus-1.3B has not been extremely successful in terms of accuracy in the answers, there is a much higher quality found in the image analysis. Even if this painting is very famous, the fact the model has been able to identify it correctly, also with the name of the artist, is definitely impressive. While the aesthetical description is very good and complete, the model did not offer accurate interpretations. It seems to be highly able to recognize elements (and paintings as a whole) but to have more difficulty to analyze the meaning of the image, leading to misinterpretation - for example the fact the village shown in the painting relates to Van Gogh's homeland.
Analysis of The Lady and the Unicorn tapestry by the Maître d’Anne de Bretagne (aka Jean d’Ypres), circa 1500, Musée de Cluny, Paris
The third image processed and analyzed with Janus-1.3B was the medieval tapestry The Lady and the Unicorn. This artwork had been analyzed in our previous study by Florence-2-base, which had led to several misinterpretations and issues when processing the diverse figures represented.
The first command to "Explain the image" led to the following answer:
"The image depicts a colorful and intricate tapestry or painting that appears to be inspired by medieval or Renaissance art. The central figure is a regal woman standing under a blue and white striped tent, which is adorned with a banner that reads "CIVITAS FELIX." She is dressed in elaborate, flowing robes and is holding a book or manuscript in her left hand.
Surrounding the central figure are several other characters, including a lion on the left, a dog on a pedestal in front of the woman, and a unicorn on the right. The unicorn is standing on its hind legs, with its front legs raised as if in a celebratory or playful gesture.
The background is filled with various elements, including trees, birds, and other animals. There are also several banners and flags, one of which is blue and white striped, similar to the tent. The entire scene is set against a red and pink background, which is decorated with a dense pattern of leaves and flowers.
The tapestry also features a variety of other animals, such as birds, rabbits, and deer, which are scattered throughout the scene. The overall style of the artwork is reminiscent of medieval or Renaissance tapestries, with its detailed and ornate depiction of the figures and their surroundings.
The text "CIVITAS FELIX" is Latin for "City of Felix," which is a term often associated with the city of Felix in the Roman Republic. This suggests that the tapestry may be inspired by Roman mythology or history.
In summary, the image is a rich and detailed tapestry or painting that features a regal woman under a tent, surrounded by various animals and elements of nature. The scene is set against a red and pink background with a dense pattern of leaves and flowers, and the text "CIVITAS FELIX" suggests a connection to Roman mythology or history".
As we can see from this analysis, Janus-1.3B has successfully identified the artwork as a tapestry from the "medieval or Renaissance art". The description of the woman is quite elaborate: "a regal woman standing under a blue and white striped tent [...] She is dressed in elaborate, flowing robes and is holding a book or manuscript in her left hand". As opposed to the description, the woman is not holding a book but jewelry coming from a box held by another woman. The other woman from the image is not mentioned, exception for the indication of "several other characters" around the central figure. The description of the animals is very accurate: Janus-1.3B has managed to correctly detect the lion but also the unicorn (which had been missing in Florence-2-base description) and the "dog on a pedestal". The aesthetic analysis of the colors and motifs is rather complete and well executed. The main problem from this analysis is the transcription of the text - which recalls the difficulties in the OCR analysis made with Janus-1.3B in our previous study. Indeed, the inscription on the tent reads "A mon seul désir" (which would translate in English by "To my only desire", relating to a message of love). Janus-1.3B transcription of the text reads: "CIVITAS FELIX", which it explains by the fact "CIVITAS FELIX" is Latin for "City of Felix," which is a term often associated with the city of Felix in the Roman Republic. This suggests that the tapestry may be inspired by Roman mythology or history". The inscription does not read "CIVITAS FELIX", nor is it in Latin at all. The inscription "CIVITAS FELIX" would in fact translate to "Happy city". There has never been a city of Felix, in the Roman Republic: the closest it might be referring to is the territory of the ancient city of Capua in the Roman period, sometimes identified as "Campania Felix". Once again, while the aesthetical description is extremely good, the transcription abilities of Janus-1.3B are not highly accurate as for now, especially to transcribe a text in French. It has to be noted the inscription is written in stylicized characters, which might have made it more difficult for the model to process.
With the "Describe the image" command, the result is similar to the first answer:
"The image depicts a detailed and colorful tapestry that appears to be a medieval or Renaissance-style artwork. The central focus of the tapestry is a regal figure, likely a queen or princess, standing under a large, ornate tent with a blue and white striped canopy. The tent is adorned with a banner that reads "CIVITAS VIRGINIA," which translates to "City of Virginia."
The queen is dressed in elaborate, flowing garments, predominantly in shades of red and gold, and is holding a scepter, symbolizing her royal authority. Surrounding her are several attendants, including a woman in a pink dress who appears to be assisting the queen with a task. There is also a man in a blue tunic, who seems to be assisting the queen with a document or a book.
The scene is set in a lush, forested environment, with various animals and plants depicted. There are several animals, including a lion, a unicorn, and a goat, all of which are positioned around the queen and attendants. The lion is on the left side of the tapestry, the unicorn is on the right, and the goat is in the foreground near the tent.
The background of the tapestry is filled with intricate floral patterns and foliage, creating a rich tapestry of colors, predominantly shades of pink, blue, and green. The entire scene is framed by a border that features a repeating pattern of leaves and flowers, adding to the overall ornate and detailed nature of the artwork.
The tapestry also includes several other elements that contribute to its medieval or Renaissance aesthetic. There are birds flying in the sky, and a few animals, such as a deer and a rabbit, can be seen in the background. Additionally, there are several flags and banners, one of which is prominently displayed on the right side of the tapestry.
The tapestry is rich in symbolism and detail, reflecting the grandeur and opulence of the period it represents. The presence of the tent, the royal figure, and the attendants suggests themes of royalty, authority, and possibly a royal court or a ceremonial event.
In summary, the image is a detailed and colorful tapestry depicting a regal scene with a central figure in a royal tent, surrounded by attendants and animals, set in a lush forested environment. The tapestry is rich in symbolism and detail, reflecting the grandeur and opulence of the period it represents".
This time, Janus-1.3B characterizes the main character as a "queen". Once again, the aesthetical description is very complete and highly detailed, especially to mention all the elements of decor but also to list the figures: "The background of the tapestry is filled with intricate floral patterns and foliage, creating a rich tapestry of colors, predominantly shades of pink, blue, and green [...] The tapestry also includes several other elements that contribute to its medieval or Renaissance aesthetic. There are birds flying in the sky, and a few animals, such as a deer and a rabbit, can be seen in the background".
For this descrption, the model correctly identifies the second woman as "a woman in a pink dress who appears to be assisting the queen with a task". There is an inacurracy in the mention of "a man in a blue tunic, who seems to be assisting the queen with a document or a book" that cannot be found on the tapestry. Janus-1.3B accurately situates the artwork as "medieval or Renaissance-style". It also mentions the presence of the lion and the unicorn and places them correctly in the image.The main issue in this description, apart from the unknown invisible man mentioned, is still the transcription: this time, Janus-1.3B indicates it reads ""CIVITAS VIRGINIA," which translates to "City of Virginia."", which is also incorrect.
The last question asked to the model for this image was "Where is the unicorn on this image?" to which it answered: “The unicorn is located on the right side of the image, near the bottom. It is standing on its hind legs, with its front legs raised as if it is about to take a step forward.”. The description is perfectly accurate.
As we can see for The Lady and The Unicorn tapestry, Janus-1.3B has incredibly good abilities for analyzing images, even with some missteps and misunderstandings. The model correctly detects, describes and situates the elements in the image. It made an attempt to link it to the right historical period. The only main issue was to process and transcribe the elements of text in French, similar to what we experienced in our previous OCR study with Janus-1.3B. The aesthetical description was very complete.
Analysis of Claude Monet's Luncheon on the grass, 1865-1866, Musée d’Orsay, Paris
The fourth image analyzed with Janus-1.3B was Claude Monet's Impressionistic Luncheon on the grass, which has been previously processed by Florence-2-base and Qwen2-VL-2B as well.
The first instruction given to the model was to "Explain the image" to which it delivered the following analysis:
"The image depicts a scene of a picnic in a forested area. There are three people in the image, two men and one woman. The woman is seated on a white blanket spread on the grass, wearing a light-colored dress. She is holding a plate and appears to be engaged in eating or preparing to eat. Next to her, there is a man sitting on the ground, dressed in dark clothing, possibly a suit or a jacket. He is holding a bottle and seems to be conversing with the woman. Another man stands behind the woman, dressed in a light-colored coat and hat, holding a parasol. The background is filled with lush green trees and foliage, suggesting a serene and natural setting. The picnic setup includes a variety of items such as a bottle, a basket, a plate, and some fruits or vegetables on the blanket. The overall atmosphere of the image is calm and leisurely, capturing a moment of relaxation and enjoyment in nature".
In this case, Janus-1.3B has not been able to identify the name of the artist or of the painting. The description of a "picnic in a forested area" is correct. The model detects three characters, two men and one woman instead of the four figures represented in the painting. The sitting woman has been accurately described by the model, as well as the man sitting next to her (but the fact he seems to be "conversing with the woman" is not clear on the image). The model, just as Florence-2-base and Qwen2-VL-2B focuses a part of its description on the natural elements: "The background is filled with lush green trees and foliage, suggesting a serene and natural setting." The description and positioning of the objects is accurate, moreso than the results obtained with Florence-2-base. The interpretation of the painting by Janus-1.3B as with an "overall atmosphere of the image is calm and leisurely, capturing a moment of relaxation and enjoyment in nature", fits the impression given by this image. There was no attempt made to link the painting to a particular period in time, as opposed to Qwen2-VL-2B which had correctly situated the scene in the late 19th Century based on the clothes the figures were wearing.
The second attempt at analyzing Claude Monet's Luncheon on the grass was with the command to "Describe the image".
Here is the result:
"The image depicts a scene of a picnic in a forested area. The setting is outdoors, with lush green trees and foliage providing a natural backdrop. The scene is composed of several figures, including a man, a woman, and a child, all dressed in period clothing.
The man, seated on the left side of the image, is wearing a dark suit with a white shirt and a red tie. He appears to be engaged in conversation with the woman sitting next to him. The woman is dressed in a light-colored dress with a blue sash, and she is holding a plate in her hand. Her posture suggests she is either about to eat or has just finished eating.
To the right of the woman, there is a child, also dressed in period clothing, sitting on the ground. The child is holding a parasol, which is a common accessory for outdoor activities during the 19th century. The parasol is open, providing shade for the child.
In the background, there is another man standing, dressed in a light-colored coat and hat. He appears to be observing the scene or perhaps waiting for his turn to join the picnic.
The picnic setup includes a blanket spread out on the ground, with various items placed on it. There is a basket containing fruits, a bottle, and a plate with some food. The fruits include apples and possibly other types of produce, indicating a picnic with a variety of food items.
The overall atmosphere of the image is serene and leisurely, capturing a moment of relaxation and enjoyment in nature. The clothing and setting suggest that this scene is from the late 19th or early 20th century, a time when picnics were a popular pastime.
The painting style is realistic, with attention to detail in the clothing, the objects, and the natural surroundings. The use of light and shadow adds depth and dimension to the scene, making it appear lifelike and immersive.
In summary, the image is a detailed and realistic depiction of a picnic in a forest, featuring a man, a woman, a child, and another man, all dressed in period clothing, enjoying a leisurely outdoor activity. The picnic setup includes a blanket, food items, and a parasol, all set against a backdrop of lush greenery".
Interestingly enough for this second analysis Janus-1.3B identifies three characters: only one man, a woman and a child. It could have identified the sitting woman as a "child" (probably because of her shorter stature while sitting down as opposed to the other character) but mentions the presence of the sitting woman as well. This is a mistake we have already seen in the description made for this painting by Qwen2-VL-2B which had identified a "baby" in the image. The child in this image would be sitting down next to the woman and "is holding a parasol, which is a common accessory for outdoor activities during the 19th century. The parasol is open, providing shade for the child". There is no open parasol on this image and the statement that holding a parasol is a common accessory for outdoor activities during the 19th century is questionable. What is positive in this description is that the model has successfully situated the artwork in the correct time period, also mentioning that the "clothing and setting suggest that this scene is from the late 19th or early 20th century, a time when picnics were a popular pastime". It's an interesting point to notice that the image is described as "detailed" and "realistic", even with its Impressionistic style. The description of the rest of the objects is rather accurate.
We pushed Janus-1.3B to identify the artist who had made this painting with the question: "Who is the artist who painted this image and around which date was it painted?" The answer was the following: "The image is a reproduction of a painting by Pierre-Auguste Renoir, a French Impressionist painter. The painting is titled "Picnic" and was painted in 1877". While the date is almost correct (give or take ten years) and the title does correctly indicate the subject of the painting, the name of the artist is incorrect. But it has to be noted Renoir also made very similar depictions of picnic scenes (in fact, its own version of "Le Déjeuner sur l'Herbe" is known as "Picnic" for the English translation of the title). The theme of picnic in paintings was actually a trend at the time, also seen in Manet's paintings. All in all, it can be considered a successful result in that the model has been able to identify the name of the artist as someone from the French Impressionist group.
The results of Janus-1.3B descriptions of Monet's Luncheon on the Grass might have been less accurate than in our previous examples, especially to identify and detect the human figures. This might be due to the fact that the figures are less detailed than in figurative paintings which might have led the model to misinterpretations. In any case, it was successful to indicate this was an Impressionist 19th Century painting.
Analysis of Wassily Kandinsky's Impression III (Concert), 1911, Lenbachhaus, Munich
The last image analyzed for this art analysis by Janus-1.3B is an abstract painting by Wassily Kandinsky, titled Impression III (Concert), which we had processed in a previous study with Florence-2-base. The figures not being detailed and almost unrecognizable and the elements very hard to process and describe had made it difficult for Florence-2-base with object detection. It had mainly based its analysis on the color palette. In an attempt to caption the image, it had made a mention it might be flowers. When asked to detect objects, it had simply identified the whole image as a painting.
With Janus-1.3B, the first result was obtained with the "Explain this painting" command.
"This painting appears to be an abstract expressionist piece. The use of bold, vibrant colors and loose brushstrokes suggests a focus on emotional and expressive qualities rather than realistic representation. The composition is dynamic, with a variety of shapes and forms that seem to interact and blend together. The colors are primarily yellow, red, blue, and black, with some green and gray accents. The overall effect is one of energy and movement, with the brushstrokes creating a sense of spontaneity and fluidity. The painting does not depict a specific subject matter but rather captures the essence of color and form through its abstract representation".
While Janus-1.3B links the painting to an "abstract expressionist piece" and delivers a description of its color and of the brushstrokes, it indicates that it "does not depict a specific subject matter but rather captures the essence of color and form through its abstract representation". Similar to Florence-2-base, it has difficulties to identify the elements on the image and focuses its description on the colors. In this case, it even draws the conclusion there is not a specific subject in the image. On the contrary, when it had analyzed another abstract painting by Kandinsky, Qwen2-VL-2B had made an attempt at interpreting the elements in the image - to which it had been unsucessful, as it has based the whole painting on a "flamingo" figure which had been misunderstood for natural pink-ish elements at the back of the painting.
The second command send to Janus-1.3B was to "Describe the painting" to which it answered with the following text:
"The painting in question is an abstract artwork characterized by a mix of vibrant colors and loose, expressive brushstrokes. The dominant colors include yellow, red, blue, and black, which are applied in a somewhat chaotic manner, creating a dynamic and energetic composition.
In the upper left corner, there is a large, circular shape that appears to be yellow, possibly representing a sun or a light source. Adjacent to this, there are dark, irregular shapes that could be interpreted as abstract forms or perhaps trees. These shapes are painted in shades of black and dark blue, adding contrast and depth to the composition.
Towards the center and right side of the painting, there are clusters of shapes that resemble flowers or blobs. These blobs are painted in a variety of colors, including red, blue, and yellow, and they seem to be interacting with each other in a playful manner. The brushstrokes are bold and expressive, giving the impression of movement and fluidity.
The background of the painting is primarily yellow, which provides a warm and inviting atmosphere. There are also patches of gray and white, adding to the complexity and depth of the composition.
In the lower right corner, there is a signature that reads "G. de Jong 1929," indicating the artist and the year the painting was created.
Overall, the painting is a vibrant and dynamic representation of abstract art, characterized by its use of bold colors, loose brushstrokes, and a sense of movement and energy".
Once again in this description, the model does not really try to define or explain the abstract elements. It mentions the abstract figures-like elements as potential "flowers" or blobs (the flowers being similar to the results obtained with Florence-2-base previously) Still with a major focus on the colors, Janus-1.3B summarizes the painting as "an abstract artwork characterized by a mix of vibrant colors and loose, expressive brushstrokes. The dominant colors include yellow, red, blue, and black, which are applied in a somewhat chaotic manner, creating a dynamic and energetic composition". The rest of the description indicates the various colors, shapes and brushstrokes in the image and the impression it conveys: "These blobs [...] seem to be interacting with each other in a playful manner. The brushstrokes are bold and expressive, giving the impression of movement and fluidity. The background [...] provides a warm and inviting atmosphere. There are also patches of gray and white, adding to the complexity and depth of the composition". The aesthetical and color palette description are the main interest in this analysis, as often with Janus-1.3B. Even if the instruction did not ask for it, the model has transcribed the signature on the lower right corner. However, the result is not accurate. For Janus-1.3B, the signature reads "G. de Jong 1929" which indicates "the artist and the year the painting was created". In fact, the signature is the name of "Kandinsky" in a stylicized handwriting, along with the date "1911" which the model has misread. OCR processing and transcription, especially with handwritten content and with unusual words/names or in a foreign language is definitely one of the main issue with Janus-1.3B analysis.
The last question asked to Janus-1.3B was: "Which movement does this painting belong to?".
Its answer was the following: "This painting appears to be an abstract expressionist piece. Abstract expressionism is characterized by the use of bold, gestural brushstrokes and a focus on conveying emotion and inner experience rather than realistic representation. The vibrant colors and dynamic forms are typical of this movement".
From an art historical point of view, Janus-1.3B is incorrect: Abstract Expressionism is a movement born post-WWII in the United States, to which belonged Jackson Pollock, Mark Rothko, Clyfford Still etc. The movement this painting belongs to is to the first movement of abstract art, in the early 20th Century. Still, from an aesthetical angle, the model was successful in situating the painting as an abstract artwork. It is without a doubt that the lack of figurative elements makes it very difficult for the model to process and identify the elements correctly.
Conclusion
To conclude, this study of DeepSeek Janus-1.3B abilities to process, describe and explain art shows several things.
When it comes to aesthetical descriptions, the model is highly capable of delivering long, complete and accurate results. This can be shown especially with the more figurative artworks but the model still managed to correctly detect elements in Impressionist and post-Impressionist works. When it comes to more recognizable and famous images, such as Vincent Van Gogh's The Starry Night, Janus-1.3B was able to correctly identify the artwork as well as the artist (just as Qwen2-VL-2B had done). However, one of the main issues found throughout this study was the difficulty encountered when the model was not able to correctly process and transcribe elements of text.
Just as we had seen in our previous study, the model has trouble to detect and understand textual elements, especially when they are handwritten and in foreign languages (for example, in this case in French which was mistaken for Latin and poorly retranscribed). The visual description of the artworks, especially for Raphael's The School of Athens and The Lady and The Unicorn tapestry, was detailed and mostly entirely accurate - if not for some minor mistakes and misunderstandings. The vocabulary used in the descriptions was extensive and complex and used common words and expressions found in artistic descriptions. The model usually successfully found the accurate time period of the artworks.
If the analysis of textual content was clearly less successful with Janus-1.3B than with Qwen2-Vl-2B, its visual analysis was extremely thorough and accurate and the mistakes were often easily explainable.
One of the other main use of Janus-1.3B has been to generate images out of text. For research purposes, an attempt was made during this study to generate an image out of a prompt as well. In this case, the model was asked to generate Vincent Van Gogh's The Starry Night to which it delivered these results:
If the details are not exactly the same to the original (which might reassure the art experts who would fear this might lead to forgeries) this shows Janus-1.3B has very good abilities to process and understand art, either to deliver textual interpretation of an image or vice-versa.
Bibliography
- Maître d’Anne de Bretagne (aka Jean d’Ypres), The Lady and The Unicorn (circa 1500), Musée de Cluny, Paris
- Raphael (1483-1520), The School of Athens (1509-1511), Apostolic Palace, Vatican City
- Claude Monet (1840-1926), Luncheon on the grass (1865-1866), Musée d’Orsay, Paris
- Vincent Van Gogh (1853-1890), The Starry Night (1889), MoMa, New York
- Wassily Kandinsky (1866-1944), Impression III (Concert) (1911), Lenbachhaus, Munich