Twitter this week began indexing every public tweet posted since it began operating in 2006.
“Our long-standing goal has been to let people search through every tweet ever published,” said Yi Zhuang, who led the team working on the project.
Use cases Zhuang cited for the new infrastructure include results for entire TV and sports seasons, conferences, industry discussions such as in the mobile payments field, places, businesses, and long-lived hashtag conversations such as #Ferguson, #HongKong and #Election 2012.
Forget the concept of providing a service for users, to which Zhuang tipped his hat.
The move is “of value to marketers that want to mine its historic datasets and trends,” said Alan Pelz-Sharpe, research director for social business applications at 451 Research.
“Twitter will be hoping folks find gold nuggets in here and, more importantly, figure out at a granular level what works and what doesn’t,” Pelz-Sharpe told TechNewsWorld. “Those lessons, in turn, can feed greater sponsorship and advertising revenues in future.”
What the Search Service Will Do
The service efficiently indexes roughly half a trillion documents and serves queries with an average latency of under 100 ms, Twitter said.
The full index, which is more than 100 times larger than Twitter’s real-time index, grows by several billion tweets a week.
The full index processes daily batches of tweets en masse. Twitter makes heavy use of Hadoop for processing.
The process runs every day to process data incrementally, allowing for massively parallel processing on Hadoop, which in turn lets Twitter efficiently rebuild the full index periodically.
Currently, complete results from the full index will appear in the All tab of search results on the Twitter Web client, as well as on Twitter for iOS and Android apps. Over time, tweets from this index will appear in the Top tab of search results and in new products the index will enable.
The index is just a part of ongoing improvements to Twitter’s search and discovery functions.
Mighty, Mighty Are Twitter’s Works
“Twitter’s rollout … is an enormous achievement in computer science,” Andreas Scherer, managing partner at Salto Partners, told TechNewsWorld.
Making hundreds of billions of tweets searchable with a latency of under 100 ms “is like combining the capacity of CNN providing the latest news with the thoroughness and completeness of the Library of Congress in real-time.”
There are “endless opportunities” to mine the database for market trends, Scherer contended. “It will increase the value of tweets for search engine optimization; it gives us a new way to look at developments on our recent history.”
Twitter might be “laying the groundwork for introducing products in the future that may be relevant to business users, just as Facebook is now exploring how to monetize and develop a growth strategy beyond the everyday consumer,” wrote Susan Schreiner, senior editor and analyst at C4 Trends.
“Alternatively, or as a complement to a business angle, Twitter is ideally positioned to offer a new type of data analytics platform, since it has indexed more than half a trillion documents,” she told TechNewsWorld.
Twitter might still offer its platform for free, “but a potential fee-based version might enable using [it] as a platform that can transform millions of tweets into smart data,” Schreiner continued. “This is the new gold currency driving new product and service development in this decade.”
Possible Pitfalls for Twitter
By offering such a plethora of data, Twitter might well be fashioning a noose for its own neck.
“The danger here is that folk will continue to struggle to separate signal from noise, leaving Twitter where it is today — of value for [only] briefly getting a message out,” Pelz-Sharpe said.
What’s worrying for business is that it might make Twitter “an uncontrollable platform for disgruntled customers,” he noted.
It’s “a sensible move for Twitter — it sort of had to do this,” Pelz-Sharpe continued, “but it could prove to be Pandora’s box. Once opened, you never know what might come out.”