Improving Wikimedia resilience against the risks of content-generating AI systems
Wikipedia is amongst the most-visited sites on the internet, having come to be recognised as a largely trustworthy, if still flawed, repository of the world’s knowledge. Wikipedia content is disseminated well beyond Wikipedia itself, for example through knowledge graphs displayed in Google search results, or the outputs of question-answering machines such as Amazon Alexa or Google Home. It also forms a major source of training data for many generative AI models. Wikipedia is therefore a critical element in the production and dissemination of public knowledge within the global information ecosystem.
This project explores the implications of generative AI for Wikipedia. In particular, we are examining whether Wikipedia policies and practices are robust enough to deal with the risks of generative AI to the integrity of information on Wikipedia. These include the potential for floods of misinformation or vandalism to overwhelm communities of Wikipedia editors, particularly on non-English Wikipedias, and a reduction in Wikipedia usage resulting from users increasingly turning to AI-chatbot interfaces or AI-integrated search engines when seeking information.
By garnering perspectives from Wikimedia practitioners, AI experts and academic and grey literature about its possible (and evolving) implications and by analysing current policies and practices for vetting automated tools, this project will map out the most important areas for possible policy expansion and adjustment of current practice to deal with the risks of generative AI.
The project commenced in July 2023 and runs until June 2024. Researchers on the project are Michael Davis, Heather Ford, and Marian-Andrei Rizoiu, with Tim Koskie providing research assistance.
This project is funded by the Wikimedia Research Fund.