OpenAI's next-generation image model, tentatively dubbed GPT-Image-2, appears to be running blind tests on Chatbot Arena. Blake Robbins flagged the speculation on X, noting that users are encountering anonymous models producing outputs that are photorealistic to a degree that makes GPT-Image-1 look like a rough draft. Three specific model codenames have surfaced in the testing pool:
maskingtape-alpha
gaffertape-alpha
packingtape-alpha
Multiple users have described the circulating sample images from these models as a massive leap in quality over existing generators.
What Chatbot Arena Tells Us
For anyone unfamiliar with the platform: Chatbot Arena runs blind A/B comparisons where users submit prompts and vote on which output is better without knowing which model produced it. It has become one of the more reliable signals in AI benchmarking precisely because it removes marketing from the equation. Users judge outputs on merit alone.
Models appearing on Arena before their official launch is a well-established pattern. OpenAI, Google, and others have all used the platform as a soft testing ground, gathering real-world preference data before committing to a public release. When a new model shows up on Arena, it typically means a launch is weeks away, not months.
The implication here is straightforward: if GPT-Image-2 is on Arena, OpenAI is in the final stretch of evaluation. A public release is likely imminent.
The Quality Jump
The photorealistic samples making the rounds suggest a significant leap in fidelity. Skin texture, lighting, environmental detail, and material rendering look less like AI generations and more like processed camera photography.
GPT-Image-1 was competent but had the telltale artifacts of early generative models: lighting inconsistencies, strange micro-textures on skin, and anatomical drift in hands and fingers. If the Arena samples are representative of GPT-Image-2's baseline capability, those limitations appear to be narrowing fast. The gap between "AI-generated image" and "photograph" is visibly compressing.
The Skeptic's View
A few caveats are worth noting. Arena samples are cherry-picked by definition. Users share the impressive outputs on social media, not the failures. It remains unclear how GPT-Image-2 performs across a full range of complex prompts, edge cases, and specific style requirements beyond standard portraiture or landscapes.
There's also the question of consistency. A model that produces one stunning image out of ten attempts is a novelty; a model that produces reliable results every time is a dependable tool.
What Comes Next
OpenAI has not confirmed that GPT-Image-2 is the model testing on Arena, and the company has not announced a release date. But the pattern is familiar: anonymous Arena testing, followed by a wave of public speculation, culminating in an official announcement. That cycle typically plays out over a matter of weeks.


