This example shows how to extract text informations from a PDF file without the need of system dependent tools or code. Just use the pyPdf library from http://pybrary.net/pyPdf/
Allows developer to extract text from multiple format files such as MS-Office pre-2007 (Word, PPT, XLS), MS-Visio, MS-Office-2007 (docx, pptx,xlsx), PDF, RTF, XML, HTML, Text etc.
Python module and command line utility that analyzes XML output from the program pdftohtml in order to extract tables from PDF files. Outputs CSV.
The Textract Project consists of C++ source code to extract text from a growing assortment of file formats. Output is indexing-ready. The Textract Project is intended as a foundation to support research-quality search engines.
Extract Text From HTML is an web based tutorial which deals with seperating a text from the HTML tag in an article that is specified in the URL. Here is a code which can be integrated into users website to perform the above said process. This...
To extract data from a matlab figure (.fig) files generate using version 7 or later. It can be used for both 2D and 3D plots
Aspose.OCR for Java is a character recognition component that allows developers to add OCR functionality in their Java web applications, web services and Windows applications. It provides a simple set of classes for controlling character...
User selects left top corner of areas to be cropped from image. Then user specifies size. Cropped areas are saved and displayed.
This is a simple code to extract data from an existing matlab 2D or 3D figure.
Script to extract data from my live traffic feed from feedjit
Class w/examples to provide a method to easily extract data from XML files
ImagePatchTool ver. 1.31dlTe
This script is to demonstrate how to place text/image or label an image without losing on the pixel information in the rest of the image. The main purpose of the...
We run ROI_extract code in Matlab, and chose the image which want to extract ROI from it and we select the ROI.
This module should help to add images or extract images from zip-files and add to Gallery Assist galleries from a import folder.
What does Gallery Assist Upload and Import Helper
This module extends Gallery Assist currently...
How to read metadata from flash video files (height, width, etc.)
Code oringinally stolen / ported from http://inlet-media.de/flvtool2
This recipe illustrates how to extract a low resolution grid from a high resolution grid using the Numeric package. It presents an inefficient but straightforward solution and then a more efficient solution that employs the functionality in Numeric.
This GUI program will extract text from damaged/corrupted Word files formatted in the new docx format where Word itself fails.
Docx files are actually zipped collections of XML files. XML as a format is unforgiving of data corruption....
This project handles with the question how to extract fixed feature points from a given face in a real time environment. It is based on the idea, that a face is given by Viola Jones Algorithm for face detection and processed to track pupil...
Detexter lets you extract text from multiple PDF files. Detexter uses the PDFBox library for its text extraction.
Tmx2text provides a simple interface to extract text data from tmx translation memories. It is written in Python (requires Python3 or higher) and uses PyQt (Qt 4) and is released under the GPL. Although it was created for Linux it should work on...