Extract Text From HTML is an web based tutorial which deals with seperating a text from the HTML tag in an article that is specified in the URL. Here is a code which can be integrated into users website to perform the above said process. This article will be helpful for the ASP programmers and the web developers.
Extract Audio from video (AVI, MPEG, WMV, ASF) movies and save it to any of the following popular formats: WAV PCM, MP3, WMA, ALF2, ADPCM, GSM, G.726, DSP, A-LAW, ACM, U-LAW, PCM, Ogg Vorbis.
This GUI program will extract text from damaged/corrupted Word files formatted in the new docx format where Word itself fails.
Docx files are actually zipped collections of XML files. XML as a format is unforgiving of data corruption....
Detexter lets you extract text from multiple PDF files. Detexter uses the PDFBox library for its text extraction.
PdfTextExtractor - extract text from PDF document with text layer
PdfShapeDrawer - draw basic shapes on PDF document
Programs using iTextSharp library http://itextpdf.com/
The Textract Project consists of C++ source code to extract text from a growing assortment of file formats. Output is indexing-ready. The Textract Project is intended as a foundation to support research-quality search engines.
Allows developer to extract text from multiple format files such as MS-Office pre-2007 (Word, PPT, XLS), MS-Visio, MS-Office-2007 (docx, pptx,xlsx), PDF, RTF, XML, HTML, Text etc.
Alternative way of extracting text from a WF XML source.
Aspose.OCR for Java is a character recognition component that allows developers to add OCR functionality in their Java web applications, web services and Windows applications. It provides a simple set of classes for controlling character...
This is a simple code to extract data from an existing matlab 2D or 3D figure.
To extract data from a matlab figure (.fig) files generate using version 7 or later. It can be used for both 2D and 3D plots
Converts all audio formats from one to another with most possible settings or extract audio from video. Audio Converter Pro utility is indispensable for converting audio files from one format to another directly with ID3v2 Tag editing and new Mp3...
jPDFText is a Java library to extract text from PDF documents. With jPDFText, PDF documents can be processed to extract the textual content for archiving, storage, searching or indexing.
jPDFText is built on top of Qoppa's proprietary PDF...
Script to extract data from my live traffic feed from feedjit
TextCaptureX is a COM library that allows screen text extraction in Windows applications.It is accessible from any COM aware programming languages. You can use it to extract text from any application that doesn't provide communication API's in...
Project contains two applications.
PdfTextExtractor - extract text from pdf with text layer.
PdfShapeDrawer - draw shapes on pdf files.
Project use iTextSharp library (http://itextpdf.com/).
This program is free...
This example shows how to extract text informations from a PDF file without the need of system dependent tools or code. Just use the pyPdf library from http://pybrary.net/pyPdf/
You can modify a lot of options in a simple configuration file, in...
Extract information from HTML pages that have some kind of a repetitive pattern
This package finds repetitive format patterns in an HTML page that contains one or more lists and extracts the sub-html text that creates the patterns. The...
Tmx2text provides a simple interface to extract text data from tmx translation memories. It is written in Python (requires Python3 or higher) and uses PyQt (Qt 4) and is released under the GPL. Although it was created for Linux it should work on...