This link has been bookmarked by 105 people . It was first bookmarked on 27 Jul 2006, by Ruty.
-
21 Apr 15
-
12 Nov 12
-
05 Oct 12
-
17 Mar 12
-
30 Jan 12
sidj1981Get the contents of a web page in a variable for further processing. A python based appraoch...
contents of a page remote processing django parser xml tutorial programming html python Web Application Development
-
21 Dec 11
-
01 Dec 11
-
15 Sep 11
-
03 Aug 11
-
import urllib
-
f = urllib.urlopen("http://www.python.org")
-
s = f.read()
-
-
15 Jun 11
-
16 May 11
-
Abstract
Various Web surfing tasks that I regularly perform could be made much easier, and less tedious, if I could only use Python to fetch the HTML pages and to process them, yielding the information I really need. In this document I attempt to describe HTML processing in Python using readily available tools and libraries.
NOTE: This document is not quite finished. I aim to include sections on using mxTidy to deal with broken HTML as well as some tips on cleaning up text retrieved from HTML resources.
-
-
23 Apr 11
-
21 Jan 11
-
07 Dec 10
-
08 Nov 10
-
17 Aug 10
-
31 May 10
Peeyoosh Sangolekarparse html using python
python html programming datamining parsing parser processing
-
27 Apr 10
-
21 Apr 10
-
07 Apr 10
-
01 Apr 10
-
27 Mar 10
-
13 Oct 09
-
25 Jun 09
-
22 Jun 09
-
11 May 09
-
24 Apr 09
-
02 Mar 09
-
01 Feb 09
-
28 Jan 09
-
27 Jan 09
gialloporporaVarious Web surfing tasks that I regularly perform could be made much easier, and less tedious, if I could only use Python to fetch the HTML pages and to process them, yielding the information I really need. In this document I attempt to describe HTML processing in Python using readily available tools and libraries.
-
ng a Parser Class
First of all, let us define a new class inheriting from
SGMLParserwith a convenience method that I find very convenient indeed:import sgmllib
class M
-
-
12 Jan 09
-
08 Jan 09
-
30 Nov 08
-
24 Nov 08
-
17 Nov 08
-
29 Jun 08
-
18 Jun 08
-
16 May 08
rrbgggrwwVarious Web surfing tasks that I regularly perform could be made much easier, and less tedious, if I could only use Python to fetch the HTML pages and to process them, yielding the information I really need. In this document I attempt to describe HTML processing in Python using readily available tools and libraries.
-
03 Apr 08
-
26 Mar 08
-
10 Feb 08
-
08 Feb 08
-
03 Jan 08
-
20 Oct 07
-
Various Web surfing tasks that I regularly perform could be made much easier, and less tedious, if I could only use Python to fetch the HTML pages and to process them, yielding the information I really need. In this document I attempt to describe HTML processing in Python using readily available tools and libraries.
-
-
01 Sep 07
-
21 May 07
-
18 May 07
-
22 Dec 06
-
12 Dec 06
-
28 Oct 06
-
23 Sep 06
-
Fetching standard Web pages over HTTP is very easy with Python:
import urllib
# Get a file-like object for the Python Web site's home page.
f = urllib.urlopen("http://www.python.org")
# Read from the object, storing the page's contents in 's'.
s = f.read()
f.close() -
Supplying Data
Sometimes, it is necessary to pass information to the Web server, such as information which would come from an HTML form. Of course, you need to know which fields are available in a form, but assuming that you already know this, you can supply such data in the
urlopenfunction call:# Search the Vaults of Parnassus for "XMLForms".
# First, encode the data.
data = urllib.urlencode({"find" : "XMLForms", "findtype" : "t"})
# Now get that file-like object again, remembering to mention the data.
f = urllib.urlopen("http://www.vex.net/parnassus/apyllo.py", data)
# Read the results back.
s = f.read()
s.close()
-
-
10 Sep 06
-
07 Sep 06
-
15 Aug 06
Paulo Nuindata = urllib.urlencode({"find" : "XMLForms", "findtype" : "t"})
-
01 Aug 06
Fred GagnonVarious Web surfing tasks that I regularly perform could be made much easier, and less tedious, if I could only use Python to fetch the HTML pages and to process them, yielding the information I really need. In this document I attempt to describe HTML pro
-
08 Jan 06
-
27 Oct 05
-
15 Aug 05
-
31 Mar 05
-
19 Dec 04
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.