This is the repository of OPMES (Operation-tree Pruning based Math Expression Search). The code implements a prototype and is for demonstration only.
The source repo is mainly for the code reference of our published system on ECIR 2016, and source code is not intended to be updated in this repo. The next (rewriting) version is hosted at https://github.com/approach0/search-engine and currently under development.
Our paper can be downloaded from here and you may find this slide helpful to understand our system.
Also, here is the link to our online demo: http://tkhost.github.io/opmes
In our paper, we have a demo plan trying to illustrate how our system is related to our method by showing:
- Parser output
- Index tree structure
- A simple query-to-results command and explanation.
You can going through the above by the following instructions:
-
Clone source code from this repo
-
After building (simply type
make
) the project, runparser/parser.out
to see parser output given an input LaTeX mode string. -
Type
make demo
and view a demo index tree under directory./col
-
Search a simple query by typing
./search/search.out -n -q '1/2 (n-1)!'
common
-> parser
-> index
-> search
-> web
The typical installations to meet all the binary/library dependencies:
For Debian/Ubuntu users, issue the following command:
sudo apt-get update
sudo apt-get install bison flex libreadline-dev libncurses5-dev \
libcurl4-openssl-dev libtokyocabinet-dev libbz2-dev
In an experimental index, a bzip2 compressed 7.6 MB data file (all are plain text files with one formula in every line) would result in a 1.3 GB collection directory, and hundreds MB of database files.
Another example, when finished the index of the entire math.stackexchange.com (over 8 million LaTeX equation IDs in 27180 pages of questions), it has:
- 892M formula.db
- 7.2G textree.db
- 65M webpage.db
- 9.4G collection directory.
Here are commands on Ubuntu 14.04 distribution as an example.
-
Enable CGI module:
sudo a2enmod cgi
-
Config
/etc/apache2/apache2.conf
, adding line:ScriptAlias /cgi/ /var/www/foo/cgi/ Alias /foo/ /var/www/foo/res/
-
Restart Apache server:
sudo /etc/init.d/apache2 restart
-
Install fcgiwrap (and fastcgi if you do not have)
sudo pacman -S fcgiwrap sudo systemctl enable fcgiwrap.socket sudo systemctl start fcgiwrap.socket
-
Configure nginx.conf:
location ~ \.cgi$ { fastcgi_pass unix:/run/fcgiwrap.sock; fastcgi_param SCRIPT_FILENAME /usr/share/nginx/html/foo$fastcgi_script_name; include fastcgi_params; } location /foo/ { alias /usr/share/nginx/html/foo/res/; }
-
Restart Nginx server:
sudo systemctl restart nginx
Just generate a CGI program (e.g. web/helloworld.cgi
), copy it into directory /var/www/foo/
.
To test it, simply open the Web browser and enter the URL:
http://127.0.0.1/cgi/helloworld.cgi
-
git clone --depth=1
the master branch. -
Copy the project to your server. You may want to delete
doc
,.git
folders andcrawler/*.tar.gz2
to save your transmission time. -
Before building on your server machine, you probably need to modify:
- All the pointed directories for library dependency in
dep/*
files. - The
web/config.mk
file to configure your hosting directory (e.g. under your Apache serverDocumentRoot
).
- All the pointed directories for library dependency in
-
make
,cd web/
andmake install
to finally install on your server.