Companies such as OpenAI, Meta, Anthropic, Google and France’s Mistral AI now face a wall of transparency requirements where once there were few. For those building or deploying models like GPT-4, Llama and Claude 2 system, privacy around data sources, model training, and system limitations is no longer an option if they want to remain in the vast European market.
The AI Act, touted as the world’s first binding law on these advanced systems, does not just set the expectations. It fundamentally changes how suppliers handle everything from documentation to copyright. All companies must keep detailed technical records, capture each step of how their system is trained, tested, and validated, and be ready to hand that information over to regulators or business clients who want proof of proper due diligence.
If you’re a publisher looking to incorporate a model into your product, you now gain clarity that was previously out of reach. This deep documentation is part of a broader push to ensure buyers know exactly what they’re deploying and how to stay within the law.
Some companies, particularly smaller players and those working with open source technology, have pushed back. Detailed public summaries and technical paperwork can be a heavy load when your business is running lean. The European Commission rolled out a standardised template and a voluntary code of conduct to ease the process, but the reality remains: these demands will stretch resources for many startups.
New Visibility Into Training Data
One of the most significant changes concerns the sources of training data. Historically, the origins of the text, images, and audio models consumed were opaque at best. Now, suppliers will need to publish summaries of their training material, revealing specific datasets and websites used.
This brings concrete implications. Creators will finally get a partial window into whether their books, articles or photos contributed to a system’s intelligence. In the same breath, AIs offering up verbatim song lyrics or passages from novels without authorisation will be a thing of the past.
Dragos Tudorache, a Romanian Member of the European Parliament, highlighted the core goal, saying, “These provisions are first and foremost about transparency, which guarantees that AI and the company developing it are trustworthy.”
The impact goes straight to the business core for tech giants as well as open source startups. Some, like OpenAI, have historically kept training data cloaked in secrecy, often citing concerns over competition. Now, those arguments will need to coexist with regulatory expectations and the threat of significant penalties.
Lawyer Alexandra Iteanu captures it plainly: companies “will have to structure their documentation, clarify their processes, and demonstrate their compliance.”
Companies that sign up for the new voluntary GPAI Code of Practice, released this July, benefit from streamlined paperwork and a lighter regulatory touch. The hope in Brussels is that promised rewards will encourage mass adoption.
For those that do not adjust, the EU’s new AI Office, with powers to demand compliance and hand out fines, stands ready. Fines can reach up to 15 million euros or three percent of global turnover, a risk few industry leaders can ignore, as seen recently in legal challenges involving incomplete AI documentation.







