This link has been bookmarked by 161 people . It was first bookmarked on 02 Mar 2006, by Jeff dalton.
-
01 Dec 14
-
23 Feb 14
-
15 Jan 14
-
18 Jun 13
-
13 Dec 12
carlos puentes"HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
Welcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.
The two fundamental use-cases that are handled by the parser are extraction and transformation (the syntheses use-case, where HTML pages are created from scratch, is better handled by other tools closer to the source of data). While prior versions concentrated on data extraction from web pages, Version 1.4 of the HTMLParser has substantial improvements in the area of transforming web pages, with simplified tag creation and editing, and verbatim toHtml() method output.
In general, to use the HTMLParser you will need to be able to write code in the Java programming language. Although some example programs are provided that may be useful as they stand, it's more than likely you will need (or want) to create your own programs or modify the ones provided to match your intended application.
To use the library, you will need to add either the htmllexer.jar or htmlparser.jar to your classpath when compiling and running. The htmllexer.jar provides low level access to generic string, remark and tag nodes on the page in a linear, flat, sequential manner. The htmlparser.jar, which includes the classes found in htmllexer.jar, provides access to a page as a sequence of nested differentiated tags containing string, remark and other tag nodes. So where the output from calls to the lexer nextNode() method might be:
"html java parser library opensource programming development api
-
25 Oct 12
-
11 Oct 12
-
13 Aug 12
-
14 Jun 12
Sanghammee Prashanthtml parser
-
16 Jan 12
ym kHTMLparser 이 걸 사용하는게 좋겠다. 문제는 압출을 풀면 여러 jar 파일이 보이는데, htmlparser.jar 만 사용해서 원하는바를 얻을 수 있는지 테스트 해야한다.
-
27 Sep 11
-
17 Aug 11
Mac McBurneyHTML Parser is a Java library used to parse HTML in either a linear or nested fashion.<br /> Primarily used for transformation or extraction, it features filters, visitors,<br /> custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.<br /> <br /> <br /> Welcome to the homepage of HTMLParser - a super-fast real-time <br /> parser for real-world HTML.
-
11 Jul 11
-
23 Mar 11
-
08 Feb 11
Patrizio TrinchiniHTML Parser is a Java library used to parse HTML in either a linear or nested fashion.
-
23 Dec 10
-
10 Oct 10
-
15 Aug 10
-
09 Jul 10
dartmedvedHTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
Welanalysis htmlparser opensource parser parsing processing web webdev xml xhtml api code development html java library
-
08 May 10
-
17 Apr 10
-
two fundamental use-cases that are handled by the parser are extraction and transformation
-
- Extraction encompasses all the information retrieval programs that are not meant to preserve the source page. This covers uses like:
- text extraction, for use as input for text search engine databases for example
- link extraction, for crawling through web pages or harvesting email addresses
- screen scraping, for programmatic data input from web pages
- resource extraction, collecting images or sound
- a browser front end, the preliminary stage of page display
- link checking, ensuring links are valid
- site monitoring, checking for page differences beyond simplistic diffs
-
- Transformation includes all processing where the input and the output are HTML pages. Some examples are:
- URL rewriting, modifying some or all links on a page
- site capture, moving content from the web to local disk
- censorship, removing offending words and phrases from pages
- HTML cleanup, correcting erroneous pages
- ad removal, excising URLs referencing advertising
- conversion to XML, moving existing web pages to XML
-
-
16 Apr 10
kristian82HTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
-
26 Feb 10
-
27 Jan 10
-
06 Jan 10
-
16 Dec 09
-
03 Dec 09
-
22 Oct 09
-
21 Sep 09
-
31 Aug 09
-
17 Aug 09
-
11 Aug 09
-
05 Aug 09
-
08 Jul 09
-
18 Jun 09
Philippe GuglielmettiHTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans. It is a fast, robust and well tested package.
-
09 Jun 09
-
18 May 09
-
19 Mar 09
-
10 Mar 09
-
07 Feb 09
-
16 Jan 09
-
16 Dec 08
-
09 Sep 08
-
10 Aug 08
-
02 Jun 08
-
30 May 08
-
06 Apr 08
-
03 Apr 08
-
25 Mar 08
-
22 Feb 08
-
19 Feb 08
-
21 Jan 08
-
17 Jan 08
-
05 Dec 07
-
09 Sep 07
-
06 Sep 07
-
24 Aug 07
-
09 Jul 07
-
08 Jul 07
-
31 May 07
-
09 May 07
-
16 Apr 07
-
04 Apr 07
-
19 Mar 07
-
05 Feb 07
Jay DuggerHTML Parser is a Java library used to parse HTML in either a linear or nested fashion. Primarily used for transformation or extraction, it features filters, visitors, custom tags and easy to use JavaBeans.
Also used by Websites As Graphs for their HTML -
01 Feb 07
-
23 Jan 07
-
11 Jan 07
-
11 Nov 06
-
10 Nov 06
-
25 Oct 06
-
15 Oct 06
-
01 Oct 06
Pete McKinstryHTML Parser is a Java library used to parse HTML in either a linear or nested fashion.
-
23 Aug 06
-
15 Aug 06
-
14 Aug 06
-
11 Aug 06
-
12 Jul 06
-
11 Jul 06
-
10 Jul 06
-
10 Jun 06
-
06 Jun 06
-
29 May 06
-
28 May 06
-
Bruno MartinsWelcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.
HTML OpenSource Parser code crawler development free java library programming research tool web xhtml
-
06 May 06
-
17 Mar 06
-
27 Dec 05
-
06 Dec 05
-
20 Nov 05
-
15 Nov 05
-
28 Oct 05
-
27 Oct 05
-
26 Sep 05
Emmanuel HugonnetWelcome to the homepage of HTMLParser - a super-fast real-time parser for real-world HTML. What has attracted most developers to HTMLParser has been its simplicity in design, speed and ability to handle streaming real-world html.
-
14 Jun 05
-
15 Feb 05
-
09 Feb 05
Page Comments
Also used by Websites As Graphs for their HTML
Would you like to comment?
Join Diigo for a free account, or sign in if you are already a member.