Skip to content

Program to extract text from a pdf in a specific area defined by a red circle on a photo of a hard copy of the file pdf.

Notifications You must be signed in to change notification settings

rodjonraskolnikov/Rasta2

 
 

Repository files navigation

Rasta2 is a program that extract text from a pdf given an image (photo) of one page of that pdf with a part of text highligthed with red ink.

Dependencies:
In order to compile and use Rasta2 you will need the following libraries:
* opencv 
* poppler <= 0.18.4
* sqlite3

*********To compile************
make

*********To create the DB******
place your pdf in the database/pdf directory and type:
cd script
./createDB.sh

*********To execute the program*******
./pdfextractor test/avaro-3.jpg

*********Performance Test********
The testing.sh, createTestDB.sh and insertCoordsIntoTestDB.sh scripts 
are relative to a performance test over pdf documents that haven't been 
published on this git repo due to copyright reasons.
So if you want to perform a performance test you will need to manually 
modify those scripts. :-(

About

Program to extract text from a pdf in a specific area defined by a red circle on a photo of a hard copy of the file pdf.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C 94.6%
  • Shell 3.3%
  • C++ 2.0%
  • Racket 0.1%