GitHub - ranguba/chupatext: A text extractor. It had been re-implemented by Ruby. See the link:

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 834 Commits
apt		apt
build		build
chupatext		chupatext
data		data
debian		debian
doc		doc
license		license
modules		modules
po		po
test		test
.gitignore		.gitignore
AUTHORS		AUTHORS
COPYING		COPYING
ChangeLog		ChangeLog
INSTALL		INSTALL
Makefile.am		Makefile.am
NEWS		NEWS
NEWS.ja		NEWS.ja
README		README
README.ja		README.ja
autogen.sh		autogen.sh
chupatext-excel.pc.in		chupatext-excel.pc.in
chupatext-gzip.pc.in		chupatext-gzip.pc.in
chupatext-pdf.pc.in		chupatext-pdf.pc.in
chupatext-ruby.pc.in		chupatext-ruby.pc.in
chupatext-tar.pc.in		chupatext-tar.pc.in
chupatext-text.pc.in		chupatext-text.pc.in
chupatext-word.pc.in		chupatext-word.pc.in
chupatext-zip.pc.in		chupatext-zip.pc.in
chupatext.h		chupatext.h
chupatext.pc.in		chupatext.pc.in
configure.ac		configure.ac
gtk-doc.make		gtk-doc.make

Repository files navigation

# -*- rd -*-

ChupaText had been re-implemented by Ruby. This code is no longer
changed. See https://github.com/ranguba/chupa-text/ about new
implementation.

= README --- An introduction of ChupaText, a text extraction utility

== Name

ChupaText

== Author

  * Nobuyoshi Nakada <nakada@clear-code.com>
  * Kouhei Sutou <kou@clear-code.com>

== License

  * Source: LGPLv2.1 or later. (detail:
    ((<"license/lgpl-2.1.txt"|URL:http://www.gnu.org/licenses/lgpl-2.1.html>)))
  * Document: Triple license: LGPL, GFDL and/or CC.
    * LGPL: v2.1 or later. (detail:
      ((<"license/lgpl-2.1.txt"|URL:http://www.gnu.org/licenses/lgpl-2.1.html>)))
    * GFDL: v1.3 or later. (detail:
      ((<"license/gfdl-1.3.txt"|URL:http://www.gnu.org/licenses/fdl.html>)))
    * CC: ((<BY-SA|URL:http://creativecommons.org/licenses/by-sa/3.0/>))
  * Exceptions:
    * modules/excel/: GPLv2. (detail:
      ((<"license/gpl-2.txt"|URL:http://www.gnu.org/licenses/gpl-2.html>)))
      They are included in ((<Gnumeric|URL:http://projects.gnome.org/gnumeric/>)).
    * ...

== What's this?

ChupaText is a text extraction utility. It can extracts text
and metadata from PDF and office documents. You can use it
vie library, command line and Web service.

== Dependency libraries and softwares

Required:
  * GLib >= 2.24
  * libgsf

Optional:
  * Poppler
  * wv
  * libgoffice
  * Gnumeric
  * LibreOffice, OpenOffice.org or unoconv
  * ruby >= 1.9.2

== Get

tar.gz: ((<URL:http://rubyforge.org/frs/?group_id=8073>))

== Repository

There is the repository for ChupaText on
((<GitHub|URL:http://github.com/ranguba/chupatext>)).

  % git clone git://github.com/ranguba/chupatext.git

== Install

See ((<install>)).

== Usage

  % chupatext [OPTION ...] FILE ...

FILE is a file what you want to extract from.

See ((<chupatext|"doc/chupatext.rd">)) for more details.

== Thanks

  * Yuto Hayamizu