Startup Unleashes Its Clone of Google's 'Knowledge Graph'

Giving apps the power to comprehend the web the same way a human would.
googleknowledgegraphsstory
Getty Images

If you're listening to a Skrillex song, you can say "What's his real name?" and your phone will give you his real name. If you open an email asking if you want to see Tomorrowland, you can tap on it and get instant reviews, ratings, and trailers for the latest sub-par George Clooney flick. If your get a text suggesting dinner at some hip restaurant you've never heard of, you can tap again for reservations and directions.

This, says Google, is how your Android smartphone will soon work, thanks to a new service called "Now on Tap." An extension of the company's Siri-like digital assistant, Google Now, the service will identify what's happening on your phone and pull in related information from across the web. The service works by using machine learning algorithms to determine what you're doing, then matches this understanding with information stored in what the company calls the Google Knowledge Graph---a database of semantic data describing more than 1 billion people, places, and things. "To be able assist you," says Aparna Chennapragada, who oversees Google Now, "we have to understand the world."

The Knowledge Graph isn't just a database of stuff on the net. It's a database that provides context for stuff on the net---that aims to comprehend what's there in the same way a human would. In other words, Google doesn't just "know" that the web contains pages that includes the words George Clooney. It "knows" that George Clooney is an actor here in the real world.

As the underpinning for Google Now, it works reasonably well---especially during tightly controlled demos on stage at Google's annual developer conference. And thanks to the latest in artificial intelligence technology, which can automatically determine how words are related, it continues to improve.

The rub is that, although outside companies and coders can now plug their apps into Google Now, they can't really use its knowledge graph to build their own artificially intelligent services. But Google's isn't the only knowledge graph out there. A Silicon Valley AI startup called Diffbot says it has fashioned a similarly large collection of semantic internet data, and it's beginning to share this data with other companies and coders, including Microsoft, Amazon, and eBay.

App Awareness

Diffbot says its database now spans about 600 million objects, and the hope is that it can spur all sorts of contextually aware services along the lines of Google Now. "In the future, you'll interact with thousands of intelligent apps that will need something like what we offer," says Mike Tung, the CEO of Diffbot, a company that grew out of his artificial intelligence work at Stanford University.

According to Tung, the Diffbot knowledge graph already underpins Microsoft's Bing search engine. It helps generate the contextual results that pop up on the right-hand side of the page, he says. If you type in something like "Canon EOS Digital Rebel XT," you get reviews and specifications for the popular digital camera.

Available to Microsoft and other "beta" testers as an online service, the tool is part of a broad effort to bring semantic understanding to smartphone apps and other Internet software. As Apple refines Siri and Google tweaks Google Now, Microsoft is now offering its Cortana digital assistant, and through a program it calls Project Oxford, the company is now allowing outside companies to build their own apps atop Cortana's fundamental technologies. At the same time, the likes of Facebook and Chinese search giant Baidu are developing systems that can understand natural language and respond accordingly.

Machine learning plays multiple roles here. New "deep learning" algorithms like Google's Word2Vec can help build a knowledge graph, and similar algorithms---algorithms that "learn" by analyzing vast amounts of data---can help services like Google Now make use of the graph, working to understand, say, the email you just received on your smartphone.

Making Everything Smart

Like Google, Diffbot uses various forms of machine learning at both ends of the process. Tung says the company's entire knowledge graph is generated automatically. "It's a software system that can read and interpret web pages like a human being using computer vision and natural language processing techniques," he says. Based in part on Freebase, a semantic database acquired by Google in 2010, Google's Knowledge Graph includes data complied by human hands. Tung believes Diffbot's automatic system can scale to much larger amounts of online data.

Peter Kerwin runs Collexion, a search engine for obscure collectibles like vinyl records and typewriters. The company already uses an earlier Diffbot tool to crawl the web and a build a small version of the knowledge graph, and Kerwin says his company could significantly expand its search engine if it could tap Diffbot's entire range of knowledge. "There isn't a lot of structured data about collectibles," he explains. "We need something that can give us that."

This is a small thing. But the same semantic data, Tung says, can help power a world of small apps in a similar way. Down the road, an effective knowledge graph could drive not only artificially intelligence services like Google Now, but also tools that operate beyond traditional computers and smartphones---such as a printer that could automatically order new cartridges when ink gets low. It too must understand the 'net in the same way people do.