Computing

An Open-Source Search Engine Takes Shape

search field

Commercial search-engine providers could soon face a serious competitor if the vision of some open-source developers materializes. A team of open-source programmers recently launched a project called Nutch to provide search-engine software for free.

Doug Cutting, president of the Nutch Organization, told TechNewsWorld that Nutch eventually will provide a transparent alternative to commercial Web search engines. “People should be able to know why a given page ranks higher than another for a query,” said Cutting.

Currently, all existing search engines have proprietary ranking formulas, and some search engines determine which sites to index on the basis of paid rankings. Cutting said that, in contrast, Nutch has nothing to hide and has no motive to provide biased search results.

“Open source is essential for transparency,” he said. “Experts need to be able to validate that it operates correctly and fairly. Only open source permits this.” If only a few Web search engines exist, he said, “I don’t think you can trust them not to be biased.”

Big Competition

Nutch has several major commercial competitors already in the marketplace, including Google, Microsoft’s MSN and Yahoo.

Whether Nutch will be able to penetrate this market remains to be seen. Years ago, nobody thought AltaVista would be toppled. Today, while most Internet users think of Google as an Internet fact that can’t easily be sidestepped, Internet culture is, at least to some extent, fickle.

Showing his optimism for the Nutch project, Cutting said that an open-source competitor is good news for commercial providers because Nutch will help restore confidence in the search-engine process. Also, he noted, there might be additional benefits in the future.

“If the quality of Nutch eventually gets to the point where it meets or exceeds that of Google, then Google could start using Nutch software,” said Cutting.

Google, however, does not seem to be worried at the moment. Google spokesperson Nathan Tyler told TechNewsWorld that the company likes to keep a low profile. “Nutch is yet another effort that demonstrates the value of a global interest in search engine technology,” he said.

Taking a Few Steps Back

Chris Winfield, president of 10E20, a Web design firm based in New York, told TechNewsWorld that it is premature to talk about challenging commercial search engines like Google. “Nutch will be successful, but it is too early to tell on what level that success will come,” said Winfield.

However, Winfield did say Nutch will eventually be a great tool for niche search engines. “Right now, Nutch does not have the hardware that it needs to offer a search engine on Google’s scale,” he noted. “They do have a very impressive board and an extremely talented staff — not to mention financial backing by Overture.”

Ironically, according to Winfield, one potential problem with Nutch’s approach could be the very element that the organization seems most proud of — the fact that people will be able to see the exact formula for determining results. “This could lead to some really bad spamming,” said Winfield.

“Nutch will be an interesting one to watch,” he said, pointing out that open-source search technology has the capacity to shake things up in the coming months. “If Nutch can figure out a way to keep people honest and still be completely forthcoming with the formulas used for their results,” he said, “that will be a site to behold.”

Success on a Small Scale

Brian Piccolo, senior technologist at Boston-based Internet consultancy Pixel Bridge, predicts Nutch will be successful but only on a small scale. “Even if Nutch achieves a small percentage of the total Web searches, it will require a huge investment in hardware and bandwidth to compete on a grand scale,” Piccolo told TechNewsWorld.

“There are thousands of open-source mavens out there who will be big users of Nutch,” he said. “But it will be difficult to penetrate the mainstream market because Google and Yahoo have such a big head start.” Piccolo predicted Nutch will not be able immediately to apply as much pressure to Google as MSN and Yahoo can — which leaves Nutch to exist mostly in the technology and academic communities in the near future.

“Google remains a favorite among geeks and common folk alike because, quite frankly, it has the best search results around,” he said. “But if someone comes along with better results and enough marketing savvy, Google could be challenged.” Will it be Nutch? Probably not, said Piccolo.

Handful of Niches

What’s more likely to happen, he said, is that a handful of commercial niche engines will be created using Nutch technology as their foundation. For example, commercial search engines could tweak Nutch’s algorithm enough to prevent spammers from polluting the results and might have access to enough capital to become formidable players in the commercial search engine business.

“I think it’s doubtful that searchers like Google will have Nutch on their radar screens in the short term,” said Piccolo. However, he said, if the Nutch project begins to build momentum, the large search engines will respond — and they will do so quickly and with force.

“At the end of the day, it comes down to how well Nutch meets market expectations,” concluded Piccolo.

1 Comment

  • I think the idea of open source search engine like nutch is a further step to the search engine revoloution. This idea should be will be very carefully verified by the big players. If this idea clicks, though in small level, it might send some shivers go googles and yahoo! But there is not much threat to the google or yahoo as they have upper hand in the search industry and it has been deeply routed in the search people minds. The open – source coding might go against the nutch people if search engines use them in future.
    Lastly commenting upon it i will say that this idea is really worth take a review at once if not threat to big players…..

Leave a Comment

Please sign in to post or reply to a comment. New users create a free account.

Technewsworld Channels