Google said its Gemma series of small language models have received downloads of over 100 million upon unveiling the latest iteration, which ranges from one billion parameters in size up to 27 billion.
Google claims its latest small AI model is the best-performing in the world on a single chip, outpacing models vastly larger, including Meta’s Llama-405B, OpenAI’s o3-mini, and DeepSeek-V3, enabling users to build applications on a single GPU. Only DeepSeek R1 ranks higher.
Subscribe today for free
Out of the box, Gemma 3 supports over 35 languages while also including support for more than 140 languages.
The small models can be used to analyse images, text, and short videos, boasting a 128k-token context window, which would allow it to handle sizable inputs.
Gemma 3 also supports function calling and structured output, meaning users can apply it to automatic tasks for agentic AI applications.
Google said the new family of small AI models are its “most advanced, portable and responsibly developed open models yet”.
“They are designed to run fast, directly on devices — from phones and laptops to workstations — helping developers create AI applications, wherever people need them,” a company blog post reads.
Google first unveiled its Gemma line of models last February, offering small-scale open source models based on its flagship Gemini model to rival similar-sized offerings from the likes of Meta and Mistral.
The small language model market has since exploded, with developers looking for AI systems capable of running on devices in a bid to reduce computing costs while also reducing latency from having to send workloads to the cloud.
The latest iteration of the Gemma series also comes with official quantised versions, which reduce a model’s precision numbers to further reduce size and speed up inference times. Open source developers have often created their own quantised versions of some popular systems to speed them up.
How to access Google Gemma 3
Google’s Gemma 3 family of AI models can be accessed via Hugging Face and Kaggle.
Gemma 3 can also be accessed via AI Studio, Google’s developer platform.
The new models can also be iterated on for users of Google platforms like Vertex or Google Colab.
The models are specifically optimised to run on an Nvidia GPU of any size — meaning anything from a Blackwell to a humble gaming card.
They’re also accessible through Nvidia’s API Catalogue, meaning users can prototype using just an API call.
RELATED STORIES
Google launches free Gemini code assist for solo developers
DeepSeek’s $6m AI cost is ‘misleading’, Google DeepMind CEO claims