A new study by Yale’s Digital Ethics Center proposes a novel “copyleft” licensing framework that would require AI models trained on open-source software to remain fully transparent.
Image © stock.adobe.com
Image © stock.adobe.com
The rise of generative artificial intelligence (AI) poses challenges for the free and open-source software (FOSS) community, a global network that is committed to creating and maintaining publicly available software that anyone can use, modify, and share.
Many AI models have been built on open-source software but do not reciprocate the transparency that the FOSS community’s principles require, leaving open-source developers uncertain about how these AI tools are using their code.
A new study by researchers at Yale’s Digital Ethics Center (DEC) explores a potential solution to this problem based on a concept used in free and open-source software known as “copyleft” licenses — a twist on typical copyright rules that obliges works derived from open-source materials to remain as free and transparent as the original work rather than re-licensing it under more restrictive terms.
Our analysis showed that extending the copyleft concept to generative artificial intelligence has the potential to give open-source software developers meaningful control over how AI developers use their code.
The authors propose what they call a Contextual Copyleft AI License (CCAI) — a novel extension of copyleft licensing, which would treat generative AI models as derivative works and require AI developers training models on open-source code to make their architecture and training data freely available.
“Our analysis showed that extending the copyleft concept to generative artificial intelligence has the potential to give open-source software developers meaningful control over how AI developers use their code,” said lead author Grant Shanklin, a de Vries-Sherif Junior Fellow at the DEC and rising senior at Yale College. “Importantly, it would incentivize the formation of a community committed to building AI tools aligned with the values of the free and open-source movement, which could help ensure that AI models are developed openly and responsibly.”
Free and open-source software — which includes operating systems, web browsers, databases, scientific and creative tools, internet infrastructure, and programing and development tools — is a critically important component of modern technology. Cloud computing, smartphones, and AI and other scientific research depend on it.
The new study, published in the International Journal of Law and Information Technology, was co-authored by DEC researchers Claudio Novelli and Emmie Hine, and Luciano Floridi, the John K. Castle Professor in the Practice of Cognitive Science and the DEC’s founding director. Tyler Schroder, a former undergraduate fellow at the DEC, is a coauthor of the study.
In a comprehensive legal and policy analysis, they evaluated the benefits and risks of their proposed CCAI licensing, concluding that it is legally feasible under current copyright law as long as training of AI models does not constitute “fair use” — a legal doctrine that promotes free expression by permitting unlicensed use of copyright protected works under specific circumstances.
They also show that free and truly open-source generative AI models would present several potential benefits, including enhanced transparency, accountability, and innovation. They note that generative AI has a higher risk profile than traditional software because it can be used directly to generate harmful or deceptive content and amplify malicious activities, such as generating highly effective phishing emails.
AI companies have benefited from using open-source code, but their resulting models are not really open.
The researchers suggest that regulations like those enacted by the European Union — which explicitly prevent AI systems from using subliminal, manipulative, or deceptive techniques to distort a person’s behavior and decision making — could mitigate risks associated with open-source generative AI. Their analysis suggests that CCAI licensing would complement regulatory protections to mitigate the increased risk associated with AI models.
They also lay out key benefits to a copyleft licensing scheme. First, they argue that extending copyleft licensing from traditional software to generative AI models would give developers greater control over how their code is incorporated into AI systems. Second, they assert that CCAI licensing would lead to the creation of more open-source generative AI models by giving open-source developers access to training data that proprietary, closed models could not use. Third, they write, CCAI licensing would discourage “open washing,” a deceptive practice in which companies and organizations present their products and models as open when they are opaque and proprietary.
“AI companies have benefited from using open-source code, but their resulting models are not really open,” said Novelli, a de Vries-Sherif Associate Research Scientist at DEC.
“They might be transparent about some aspects, but other key components remain closed,” he said. “The CCAI license would ensure that models created with open-source software will be fully open and provide all the benefits associated with free and open-source software.”

Leave a Reply