Skip to main content

Joel Liu's Library tagged spider   View Popular

29 Jul 07

Build a Web spider on Linux

  • A spider is a program that crawls the Internet in a specific way for a
    specific purpose. The purpose could be to gather information or to understand the
    structure and validity of a Web site. Spiders are the basis for modern search
    engines, such as Google and AltaVista. These spiders automatically retrieve data
    from the Web and pass it on to other applications that index the contents of the
    Web site for the best set of search terms.
  • When you think of a spider in nature, you think of it in its interactions with an
    environment, not in isolation. The spider sees and feels its way around, moving
    from one place to another in a meaningful way. Web spiders operate in a similar
    way. A Web spider is a program written in a high-level language. It interacts with
    its environment through the use of networking protocols, such as the Hypertext
    Transfer Protocol (HTTP) for the Web. If your spider wants to communicate with
    you, it can use the Simple Mail Transfer Protocol (SMTP) to send an e-mail
    message.
1 - 2 of 2
Showing 20 items per page

Diigo is about better ways to research, share and collaborate on information. Learn more »

Join Diigo