Skip to main content
A block of wooden card catalogue drawers

Uncovering bias in the way we document the past

Published: 6 July 2023

Museums can often seem to offer a neutral view of history. But research by James Baker offered a new way of investigating collection catalogues for traces of historical bias in curators' descriptions.

His work is impacting the way we document and understand our cultural legacy going into the future.

Discovering the influence of curators on our cultural legacy

A teacher of digital humanities at the University of Southampton, James Baker is fascinated by collection catalogues. 

“Catalogues are fundamental to the history of cultural institutions,” he says. “They list and describe all the objects kept in a particular collection. For example, books, art or ornaments kept in a museum or gallery.” 

The digitisation of catalogues for use by online databases and machine learning tools has built on this legacy. 

But James could see a problem with the way catalogue descriptions were being reused. 

“When catalogues are reused as the basis for contemporary descriptions of collection items, a powerful and often difficult to detect "curatorial voice" remains.”

“Catalogues are the products of curatorial labour, often spanning many decades, and so are subject to various biases.”

James wanted to know more about this curatorial voice and what it meant for our understanding of collection catalogues. So he got to work.

Studying the curatorial voice

In 2018, James began studying the legacies of a landmark collection, the British Library’s ‘Catalogue of Political and Personal Satires’. 

This would be a uniquely challenging project. The catalogue contains 2 million words and describes over 17,000 satirical works of art. 

However, over two-thirds of the entries in the catalogue were written by a single person: Mary Dorothy George, between 1930 and 1954. This allowed James and his team to study George as a “curatorial voice”, a messenger between the archived past and the digitised present. 

Using close textual analysis, James looked for traces of her voice from the 1930s to the late-20th century and beyond. He looked at both printed volumes and networked digital data. 

His team also used corpus linguistic techniques, where statistical analysis is applied to large pieces of text to identify linguistic patterns. 

He learned that George’s descriptions are far from straightforward verbal representations of visual representations. They are a product of a voice shaped by traditions, preferences, and values. 

James points to an example of this in George’s “squeamish” descriptions of explicit scenes containing “bums, faeces, and bare breasts”. By filtering these scenes, George brings her mid-20th Century social values to bear on the morally very different late-Georgian society she is describing. 

This led James to think about the implications of the curatorial voice in larger legacies. He felt there was a need for research on a wider scale.

Woman viewing a painting in a museum
Collection catalogues provide a valuable source of information about the objects kept by cultural institutions such as museums

Understanding how cultural institutions represent their objects

In 2020, James began a new project called 'Legacies of Catalogue Descriptions and Curatorial Voice'

This time, his team experimented with machine learning approaches to detect linguistic patterns in the various legacies of a catalogue. This allowed them to analyse the various ways in which descriptions evolve, from printed media and online databases to machine learning systems.

“We found that transmission through time, across space, and between mediums took many forms,” says James.

Cataloguers drew on and engaged previous work as a source of expertise. They reacted to the assumptions of previous cataloguer's worldviews and developed their thinking for modern audiences.

James believes that this has serious consequences for the trust which users can have in federated catalogues.

When curatorial voices represent historical or social prejudices, it can cause a distortion in a whole society’s understanding.

This is especially true when descriptions containing curatorial bias are translated into other datasets. For example, when they are picked up by machine learning systems and AI language tools such as ChatGPT.

Pioneering new ways to document our cultural legacy

Going into the future, James believes that more work needs to be done to identify curatorial voices in collection catalogues.

In 2022, James published a paper titled ‘Detecting and characterising transmission from legacy collection catalogues’. 

He hopes that the research methodology he has developed can be used as a starting point for work by other cataloguers and researchers. 

“With the expanded use of AI and machine learning tools, there is a risk that curatorial voices such as Mary George’s can be misrepresented as objective information on an ever larger scale.” 

This is why it’s important that we reinvest in cataloguing expertise and that heritage professionals are able to continue identifying curatorial voices and their impact on collection legacies.

“By doing so, they are changing how the public discover and understand our cultural heritage.”

Related publications

Andrew Salway,
& Cynthia Roman
, 2022 , Digital Humanities Quarterly
Type: article
Andrew Salway,
& James Baker
, 2020 , Museum & Society , 18 (2) , 151--169
Type: article