How Google’s new AI model protects user privacy without sacrificing performance

googai5555gettyimages-2234049984

picture alliance/Contributor/picture alliance via Getty Images

Follow ZDNET: Add us as a preferred source, an LLM designed to generate high-quality outputs without memorizing its training data verbatim. The result: Sensitive information that makes it into the training dataset won’t get republished.

Digital noise

The key ingredient behind VaultGemma is a mathematical framework known as differential privacy (DP), which is essentially digital noise that scrambles the model’s ability to perfectly memorize information found in its training data.

Crucially, the researchers embedded DP at the level of sequences of tokens. This means that at the most fundamental level, VaultGemma will not be able to perfectly memorize or reproduce the details on which it’s been trained.

Also: 4 ways I save money on my favorite AI tool subscriptions – and you can too

“Informally speaking, because we provide protection at the sequence level, if information relating to any (potentially private) fact or inference occurs in a single sequence, then VaultGemma essentially does not know that fact: The response to any query will be statistically similar to the result from a model that never trained on the sequence in question,” Google wrote in a blog post.

However, VaultGemma still performed across key benchmarks roughly on par with some older models, including OpenAI’s GPT-2. This suggests that a compute-privacy-utility optimization framework could eventually be a viable alternative to leading proprietary models, even though it has a long way to go before it comes close to catching up.

Also: How people actually use ChatGPT vs Claude – and what the differences tell us

“This comparison illustrates that today’s private training methods produce models with utility comparable to that of non-private models from roughly 5 years ago, highlighting the important gap our work will help the community systematically close,” Google wrote in the blog post.

The model weights and training methods behind VaultGemma have been published in a research paper and Kaggle

Leave a Comment

Your email address will not be published. Required fields are marked *