Generative AI Exhibits the Lightness/Pitch Correspondence in Image Generation and Classification Tasks
John McEwan, J. Day
Abstract
A crossmodal correspondence (CMC) is an association between sensory features from different modalities. For example, the lightness/pitch correspondence is the tendency to pair more luminant stimuli with higher pitches and dimmer stimuli with lower pitches. There has been a recent surge in research examining how these associations are present in current generative AI models and to what degree they are language dependent. Previous research on the relationship between CMCs and AI has focused on explicit judgements of association, typically with rating scales. In contrast, other psychology subfields such as social psychology are using implicit measures of associations in AI output. In the present study, we argue for the merits of an implicit functionalist approach to generative AI research in CMCs and use two psychophysical paradigms to demonstrate this approach. Experiment 1 explores how the lightness/pitch correspondence might manifest in text‐to‐image models when they are prompted to visually depict auditory characteristics. The results indicate that DALL‐E 3 consistently employs the lightness/pitch, as well as a contrast/pitch correspondence when attempting to visually depict auditory pitch. Experiment 2 then looks at ChatGPT‐4o′s performance in classifying the images from Experiment 1 as high or low in pitch. We find that ChatGPT‐4o uses both lightness and contrast information to inform its classifications of pitch. The implications of these results regarding the study of sensory associations with AI, as well as the specific future research directions of CMCs, are discussed.