Jul 9, 2015 2:00 PM

Google Says Its AI Catches 99.9 Percent of Gmail Spam

Gmail's spam filters don't just curb junk by applying pre-existing rules. They create new rules as they go along.

About a decade ago, spam brought email to near-ruin. The contest to save your inbox was on, with two of the world's biggest tech companies vying for the title of top spam-killer.

By February 2012, Microsoft boasted that its spam filters were removing all but 3 percent of the junk messages from Hotmail, the company's online email service at the time. Google responded by claiming that its service, Gmail, removed all but about one percent of spam messages, adding that its false positives rate---legitimate mail misidentified as spam---was also about one percent.

It was a point of pride for the two companies, particularly Microsoft, whose Hotmail service once carried such a poor reputation for spam. And the relative success of both showed that heuristic technologies---which identify spam based on a pre-defined rules---were working.

But they still weren't working well enough. One percent spam is still pretty annoying. And a one percent false-positive rate is, well, quite a bit more than annoying, if crucial messages go unread. Naturally, these companies continue to hone their spam-battling techniques, and now, Google has upped the ante with a new breed of artificial intelligence tools.

Three years after it last released Gmail's spam stats, Google says that its spam rate is down to 0.1 percent, and its false positive rate has dipped to 0.05 percent. The company credits the significant drop in large part to the introduction of brain-like "neural networks" into its spam filters that can learn to recognize junk mail and phishing messages by analyzing scads off the stuff across an enormous collection of computers.

"One of the great things about machine learning is that it adapts to changing situations." says John Rae-Grant, a senior product manager for Gmail, which Google says is now used by 900 million people across the globe. In other words, Gmail's spam filters don't just curb junk by applying pre-existing rules. They create new rules themselves as they go along.

What You Don't Want

Along with Facebook and Twitter, Google is among the leaders in neural networking. In recent years, the company has used the technology to recognize commands you speak into your Android phone, identify photos you post to its Google Photos service, and more. According to uber Google engineer Jeff Dean, the company applies these techniques to dozens of services across the company's online empire.

It's no surprise, then, that neural nets would also prove effective in recognizing spam. Chinese search giant Baidu uses the technology to serve ads you're likely to click on. Facebook uses it to identify what you're likely to want in your News Feed. In a way, spam-detection is just the inverse of these systems. It determines what you don't want.

Other companies are exploring the use of neural networks as filtering tools. Twitter plans to use the technology as a way of detecting not only junk mail pushed onto its social networks, but also inappropriately nasty messages. "We're starting to use it for more textual understanding, so we can identify spam and abuse," says Alex Roetter, Twitter's head of engineering.

Roughly speaking, these neural networks are vast collections of machines that mimic the networks of neurons in the brain. At Google, Dean and a core group of other AI engineers oversee these networks and provide software libraries that allow other Google teams, including the Gmail team, to use them. According to Google software engineer Vijay Eranti, the Gmail team adopted the method just a few months ago.

"Especially in this area," Rae-Grant says, referring neural networking, "things are moving very quickly, from research to initial application to wide-spread applications."

Your Spam, My Treasure

What's Microsoft's rate? The company was unable to provide its latest stats for Hotmail, which is now called Outlook.com. But it too makes use of neural networking for some of its products. This is the basis of the new tool that instantly translates Skype calls from one language to another.

It should be said, however, that neural networking is just part of what a company like Google does to fight spam and ensure that legitimate messages arrive where they're supposed to. Rae-Grant also says that the Gmail uses several tools that tune its spam filters so that they suit your particular tastes.

"There's all this gray area. One person's spam might be another persons coupon," he says.

"We track and try to approximate---based on what you have previously paid attention to and reported as spam---what you want in front of you and don't want it front of you. So, in addition to this big machine learning model that's learning from everybody's reporting, we're then fine-tuning things for the individual."

In short, this isn't the email of the 2001. Thank goodness.

Correction: This story originally includes a headline that incorrectly identified Google's spam rate. This has been corrected.