Joel Liu's Library tagged → View Popular
29 Jul 07
Build a Web spider on Linux
-
A spider is a program that crawls the Internet in a specific way for a
specific purpose. The purpose could be to gather information or to understand the
structure and validity of a Web site. Spiders are the basis for modern search
engines, such as Google and AltaVista. These spiders automatically retrieve data
from the Web and pass it on to other applications that index the contents of the
Web site for the best set of search terms. -
When you think of a spider in nature, you think of it in its interactions with an
environment, not in isolation. The spider sees and feels its way around, moving
from one place to another in a meaningful way. Web spiders operate in a similar
way. A Web spider is a program written in a high-level language. It interacts with
its environment through the use of networking protocols, such as the Hypertext
Transfer Protocol (HTTP) for the Web. If your spider wants to communicate with
you, it can use the Simple Mail Transfer Protocol (SMTP) to send an e-mail
message.
1 - 2 of 2
Showing 20▼ items per page
Top Contributors
Groups interested in spider
Related Lists on Diigo
-
Spider and Search
Items: 2 | Visits: 23
Created by: James McKinstry
-
solitaire
solitaire related resources
Items: 3 | Visits: 22
Created by: skif2004
Diigo is about better ways to research, share and collaborate on information. Learn more »
Join Diigo
