Google open-sourced its SynthID Text watermarking tool

Google recently announced it will open-source its SynthID Text watermarking tool making it available as part of its Responsible Generative AI Toolkit and through Hugging Face's Transformers library. Google claims it has already integrated the DeepMind-developed technology into its models for some time without sacrificing the outputs' speed, quality, diversity, and creativity.

SynthID Text leverages the underlying mechanism by which LLMs generate text: they predict the likelihood of their next token based on the previous ones and the probabilities assigned to the potential candidates to continue the output text. SynthID Text creates watermarks for AI-generated text by modifying the probability distribution for the tokens in several places using one or more randomly generated watermarking functions.

The pattern of word choices where the function was applied and the adjusted probability scores are considered the watermark. A detector is trained to evaluate text against those watermarking functions: the closer the pattern and scores are to the randomly generated functions, the more likely the text is AI-generated.

The strategy has some limitations: for instance, prompts that don't allow for significant variations such as factual questions with short answers cannot be watermarked, as there will not be many opportunities to modify a token's probability distribution if the starting point does not provide with options to choose from. Moreover, Google says that the watermark will resist some degree of tampering, such as cropping a longer text, lightly paraphrasing it, or replacing some of its words. It will not, however, withstand heavy paraphrasing or translation to another language.

Still, the availability of SynthID Text marks a significant milestone in making watermarking technology widely accessible. Developers deploying SynthID do not need to retrain their models, but they have to generate a watermark per model if they use different tokenizers. Otherwise, they need to ensure they train the detector on outputs from all models sharing the watermark and tokenizer. By doing this, they can set up a reasonably functional method to detect whether a piece of text was generated using their models. More technical details can be found on the Hugging Face blog post and the Responsible Generative AI Toolkit documentation