Google’s own internal evaluations reveal that its latest artificial intelligence system, Gemini 2.5 Flash, displays a drop in safety performance when compared to its predecessor. The company noted in a technical document that Gemini 2.5 Flash is more prone to producing content that strays beyond established safety parameters than the earlier Gemini 2.0 Flash version.
In terms of measured performance, the system scored 4 percent worse for generating inappropriate text and nearly 10 percent worse when generating descriptions of images, all judged by automated internal tools. Both testing metrics, text-to-text and image-to-text, are handled by software rather than direct human review.
Shifting AI Safety Standards and Industry Trends
A Google spokesperson acknowledged the setbacks, confirming the latest model performs less safely on those benchmarks. This revelation arrives as major players in artificial intelligence seek to create systems that respond to a broader array of prompts by lowering their thresholds for rejecting sensitive topics.
Other tech firms are navigating similar challenges. For example, Meta announced its latest AI offerings are purposefully designed not to favor any perspective and to engage with controversial subjects, while OpenAI has committed to building models that provide a range of viewpoints instead of adopting an editorial stance.
At times these efforts to make AI more permissive have triggered unintended consequences, such as OpenAI’s ChatGPT generating mature content for underaged users due to what the company described as a bug. Google’s own report indicates that Gemini 2.5 Flash’s greater willingness to follow instructions may contribute to its higher rate of crossing content boundaries, even when prompted to do so.
Google partially attributes the uptick in policy violations to the system being more responsive, even to instructions that stray from the rules, though the company insists that many flagged instances are false positives. The company also concedes that in some explicit cases, the model generates inappropriate content when directed, highlighting an inherent tension between obeying user instructions and upholding safety constraints.
According to recent benchmarking tools used to gauge AI responses to controversial prompts, Gemini 2.5 Flash is less likely to refuse requests involving contentious issues than the older model. Recent testing has shown that the model will willingly produce arguments for controversial stances within the realm of policy and justice, bringing renewed attention to the ongoing debate over AI safety.
Some experts argue that limited disclosure in Google’s technical reports underscores the need for clearer reporting when it comes to AI safety. The lack of detail regarding specific violations makes it difficult for third parties to fully evaluate the scope of any potential risks.
Previously, Google faced criticism for delays and omissions in releasing full safety details about its most advanced AI models. In response to concerns, the company published an updated technical report this week that included additional information on how it evaluates the safety of its systems.