An Interdisciplinary Project Investigates Art Images and AI

How is style encoded within the “black box” of deep neural networks?

March 19, 2026

Art Images and AI: Latent Space Interpretability, Art History, and the Law is a joint project led by Noam M. Elcott, associate professor of art history and archaeology, and Kathleen McKeown, Henry and Gertrude Rothschild Professor of Computer Science. Funded by a seed grant from Columbia’s Data Science Institute, the project focuses on assessing and developing models to detect which artist is behind a given art image, and generate explanations about the aesthetic features behind its prediction. The aim is to advance both computational understanding and humanistic interpretation of how multimodal models generate and understand art images through probing of a model’s latent space. 

Latent spaces are abstract, high-dimensional areas within neural networks where patterns and relationships are encoded, but not readily interpretable by humans. Although latent space studies are still nascent, they offer important opportunities to better understand generative AI.

This interdisciplinary, collaborative effort between art historians, legal scholars, computer scientists, and science-and-technology-studies scholars investigates the intersection of AI, image interpretation in latent space, and cultural analysis. By combining innovative computational techniques with traditional humanistic inquiry, the researchers critically analyze how these models organize, encode, and produce images from textual inputs, revealing implicit biases, aesthetic assumptions, and the cultural knowledge embedded in machine learning systems.

How did this project come about?

Noam M. Elcott: We discovered, much to our amazement and delight, that we were researching and publishing on the same set of questions: How is style encoded within the “black box” of deep neural networks, specifically, within the latent space where the data’s abstract patterns and relations are mapped? Kathy and her team had already completed important work on latent space in literary style, and she was interested to interrogate style in the visual arts. I, along with collaborators, had begun theorizing and questioning the role of latent space in the history and theory of art. Once we made the connection, we knew we had to collaborate. 

I was already working with Kate Crawford, a leading scholar of AI and a research professor at the University of Southern California, who immediately joined this project. We realized that any serious study of latent space and style necessitated input from computer science, science-and-technology studies, and art history. Moreover, we recognized that the intellectual property law would likely shape the way AI models were trained and, in turn, would be impacted by developments in AI. So we brought in scholars of law and technology to expand the discussion further. 

What is the purpose behind the project, and its legal implications?

NME: Deep neural networks thrive at recognizing and reproducing patterns and relations within large datasets. At a superficial but meaningful level, style can be understood in precisely these terms—the patterns and relations that undergird visual works. Yet there is no reason to assume that neural networks “see” and “understand” style the way humans see and understand style. This project will involve two separate endeavors—first, experiments in computer science to map visual styles in latent spaces. Additionally, we will gather art historians, curators, scholars of science and technology, computer scientists, and legal scholars to unpack the overlaps and divergences in style among these disparate fields, now intimately connected through AI. 

Kathleen McKeown: At this point in the project, we have developed a framework for an interpretable model that allows us to identify the concepts that the vision language model uses when it predicts the style for a given piece of art. The model “explains” the concepts both by providing snapshots of image regions and generating a short English phrase to describe each region. The concept may refer to content indicative of a style or form.

NME: Among the legal implications are questions of copyright and style. Traditionally, styles cannot be copyrighted. ChatGPT and other models can restyle images in the style of numerous creators (most recently and famously, in the style of Studio Ghibli). What are the legal implications of this universal style transfer? How might the encoding of style in latent space inflect questions of copyright and intellectual property more broadly? Can we train, interpret, and employ a model as an “expert witness” in copyright cases? These are some of the questions we hope to pose to legal scholars, computer scientists, and art historians. 

Which artists and/or time periods of art history are you focusing on?

KM: We began development of our model using art from the WikiArt dataset. We have several kinds of art that we include. One subset focuses on architecture and includes Art Nouveau, Baroque, Byzantine, Gothic, and Romanesque. Other subsets focus on painting and include styles such as Baroque, Cubism, Minimalism, Pop Art, Ukiyo, Realism, and Rococo, among others. Going forward, we have constructed a benchmark of architecture that is held by Columbia, and that has not been scraped for training any large vision language models. When testing on this benchmark, we can be assured that models haven’t memorized styles or any other aspects of the dataset. 

What are some of the biases, assumptions, and cultural knowledge embedded in machine learning systems that might affect your outcomes?

KM: In prior work, we tested cultural bias in vision language models, experimenting with a variety of datasets, including one set of artwork that had emotion labels on artwork from China and from America. Research from the cognitive sciences shows that culture mediates aspects of visual perception like color grouping and attentional focus. As vision language models are trained in both language and images, we were interested in the effect of language on bias in both objective and subjective image understanding. We wondered whether using a culturally closer language would affect understanding of images from Western or Eastern countries. We found that nearly all off-the-shelf vision language models exhibit a Western bias in every task, performing better on examples from both a narrow set of English-speaking countries and a broader set spanning North America/Western Europe than on examples from China and its neighbors. 

What are next steps? When will the project be done?

NME: We are holding a workshop in May 2026 to gather some of the world’s leading art historians, computer scientists, scholars of science and technology studies, and legal scholars. Since the results of our computer science experiments are only now (March) coming into focus, we left the May workshop purposely open. In other words, there’s no telling what will emerge from the workshop. But we are excited to find out. 

KM: We will have completed our first computational study by the time of the workshop. We will have results that indicate the concepts the model uses to predict style for a given image for the artwork from WikiArt. We will also have some analysis around what the model uses for predictions that an art historian does not, and how this affects the model’s predictions.

From there, we plan to move to the study of three-dimensional architecture from images that are not in the pretraining data of current vision language models. This should help us separate the impact of textual material around style that could be in the training data versus the visual features that the model sees. There are always more directions we can take, so we hope this project can continue for a while. We are enjoying it!