This link has been bookmarked by 11 people . It was first bookmarked on 02 Feb 2009, by Ognyan Kulev.
-
11 Feb 09
-
05 Feb 09
Matt KramerDuring a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.
Google's Deep Web Search
Halevy, who heads the "Deep Web" search initiative at Google, described the "Shallow Web" as containing about 5 million web pages while the "Deep Web" is estimated to be 500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queries to various databases, retrieving the content found for indexing. In addition to that aspect of the Deep Web - dubbed "vertical searching" - Halevy also referenced two other types of Deep Web Search: semantic search and product search.
Google wants to also be able to retrieve the data found in structured tables on the web, said Halevy, citing a table on a page listing the U.S. presidents as an example. There are 14 billion such tables on the web, and, after filtering, about 154 million of them are interesting enough to be worth indexing.
Can Google Dig into the Deep Web?
The question that remains is whether or not Google's current search engine technology is going to be adept at doing all the different types of Deep Web indexing or if they will need to come up with something new. As of now, Google uses the Big Table database and MapReduce framework for everything search related, notes Alex Esterkin, Chief Architect at Infobright, Inc., a company delivering open source data warehousing solutions. During the talk, Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery. -
Arne van ElkDuring a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "st
-
04 Feb 09
Michel Bauwens"Deep Web" search initiative at Google: vertical searching, semantic search, product search. Shallow Web ~5 million web pages; Deep Web ~500 times the size. This hidden web is currently being indexed in part by Google's automated systems that submit queri
-
03 Feb 09
-
-
By "structured data," Halevy was referring to the databases of the "deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.
-
"Shallow Web" as containing about 5 million web pages
- 1 more annotations...
-
-
"Deep Web" is estimated to be 500 times the size
-
-
-
Forrest Caoschema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery.
-
02 Feb 09
Lennie SymesDuring a talk at the New England Database Day conference at the Massachusetts Institute of Technology, Google's Alon Halevy admitted that the search giant has "not been doing a good job" presenting the structured data found on the web to its users. By "st
-
-
"deep web" - those internet resources that sit behind forms and site-specific search boxes, unable to be indexed through passive means.
-
other types of Deep Web Search: semantic search and product search.
- 1 more annotations...
-
-
Halevy listed a number of analytical database application challenges that Google is currently dealing with: schema auto-complete, synonym discovery, creating entity lists, association between instances and aspects, and data level synonyms discovery
-
-
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.