Googleโs Android coding tests reveal an unexpected Gemini 3.5 Flash weakness
Affiliate links on Android Authority may earn us a commission. Learn more. Google has just refreshed its Android Bench rankings, and the results present developers with a puzzling picture. Googleโs new Gemini 3.5 Flash is actively falling behind its predecessor while charging yo
Affiliate links on Android Authority may earn us a commission. Learn more.
Google has just refreshed its Android Bench rankings, and the results present developers with a puzzling picture. Googleโs new Gemini 3.5 Flash is actively falling behind its predecessor while charging you three times the price to use it.
The latest Android coding leaderboard , a benchmark that evaluates how well different AI models can perform Android development tasks, introduced Gemini 3.5 Flash for the first time, but the newcomer didnโt make it into the top five. Topping the list was OpenAIโs GPT 5.5 , which scored 74, followed by GPT 5.4 and an older Google model, Gemini 3.1 Pro Preview, both with 72.4. The new Claude Opus models also outperformed the Flash variant.
Gemini 3.5 Flash scored 63.7, placing sixth overall. What was more surprising, though, was its efficiency. The model averaged 355.9 total tokens, a big jump compared to other systems, according to Googleโs benchmark data. That came to an average cost of $147.1, making it the most expensive model on the entire list even with slower performance than a number of rivals.
For context, Googleโs Flash branding has always been about speed and cheaper prices. At Google I/O 2026, the company announced the most powerful Flash model it had ever built , Gemini 3.5 Flash, which it claimed had more robust coding capabilities and better support for AI agents and complex workflows. Google also said the model outperformed Gemini 3.1 Pro in a number of internal benchmarks and produced output up to four times faster than competing frontier models.
However, the Android benchmark tells a different story. Gemini 3.5 Flash might shine in the broader evaluations of agentic and coding tasks run by Google, but its performance on actual Android development tasks seems less than stellar. For example, Gemini 3.1 Pro Preview delivered a significantly better score while costing about one-third as much, as noted by 9to5Google .
The bigger question now is whether Google can improve Gemini 3.5 Flash with updates or whether the upcoming Gemini 3.5 Pro will better deliver on the companyโs performance promises. For now, Googleโs own numbers suggest that newer isnโt always better.
Thank you for being part of our community. Read our Comment Policy before posting.

