MozillaParser is a Java Html parser based on mozilla's html parser. it acts as a bridge from java classes to Mozilla's classes and outputs a java Document object from a raw ( and dirty) HTML input
Kelvina is a platform independent Java HTML parser, which outputs Document(org.w3c.dom.Document) object from any html input, including invalid one.
Jericho HTML Parser is a java library allowing analysis and manipulation of parts of an HTML document, including server-side tags, while reproducing verbatim any unrecognised or invalid HTML.
This is an open source HTML parser written in PHP. This parser also comes with a tool that converts HTML to text, as an example.
This parser is designed for speed and flexibility. It does not create an object model for you. But it...
Java HTML/XML Compressor is a very small, fast and easy to use library that compresses given HTML or XML source by removing extra whitespaces, comments and other unneeded characters without breaking the content structure.
The Java Sitemap Parser can parse a website's Sitemap (http://www.sitemaps.org/). This is useful for web crawlers that want to discover URLs from a website that is using the Sitemap Protocol.
Small footprint, fast Java XML parser. Parses to an easily manipulable nested class structure which can be converted back to formatted or unformatted XML with a single call. Uses and creates straight XML, no DTDs necessary or used.
JEPLite is a light-weighted (re)implementation of the Java Expression Parser (jep.sourceforge.net). The intention is to strip some of its not-so-often-used features, and thus speeding the rest. Includes expression optimizer.
QuickDoc is a java document parser that reads documents from plain text files using a simple language and exports the document to other formats like PDF, HTML, Java Help and XML.
HtmlCleaner is HTML parser written in Java. It transforms dirty HTML to well-formed XML following the same rules that the most web-browsers use.
A java-based parser for parsing/grabbing web sites and other text or XML documents, based on a nondeterministic parser language, creating XML output. Also contains a few utility classes for HTML, CSV and text parsing, and additional character sets.
A simple HTML 'parser' that will 'read' through an HTML file and call functions on data and tags etc.
Useful if you need to implement a straightforward parser that just extracts information from the file or modifies tags etc.
DIHtmlParser is a lightening fast and flexible Html Parser for Borland Delphi 4/5/6/7. It fully support Unicode / WideStrings throughout and is the perfect development tool to quickly extract various information from Html documents.
TinyParser is a very simple HTML parser API designed to just read the text you want. It is fast (20 MB/s), tiny (only 14 classes), memory friendly (stream based) .
This project brings implementations of BeanXMLMapping interface based in existing Java/XML parser solutions. The offered BeanXMLMapping components provide straightforward ways to convert a JavaBean to its XML document representation and vice versa.
Java based parser library
Wordpress plugin to automatically convert BBCodes to HTML tags from your post contents.