Skip to content

Sparql Engine -> Basic (no full language support) but highly efficient SPARQL engine. To be extended to support Semantic Full-Text Search (see http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/) in a standard SPARQL-like fashion. That follow up extension will feature a proper license and encourage collaboration eventually. If you're inte…

Notifications You must be signed in to change notification settings

anukat2015/SparqlEngineDraft

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

91 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

SparqlEngineDraft

How to use

  1. Requirements:

Make sure you use a 64bit Linux with:

  • git
  • g++ 4.8 or higher
  • CMake 2.8.4 or higher

Other compilers (and OS) are not supported, yet. So far no major problems are known. Support for more platforms would be a highly appreciated contribution.

  1. Build:

a) Checkout this project:

git clone https://github.com/Buchhold/SparqlEngineDraft.git --recursive

Don't forget --recursive so that submodules will be updated. For old versions of git, that do not support this parameter, you can do:

git clone https://github.com/Buchhold/SparqlEngineDraft.git
cd SparqlEngineDraft
git submodule init
git submodule update

b) Go to a folder where you want to build the binaries. Don't do this directly in SparqlEngineDraft

cd /path/to/YOUR_FOLDER

c) Build the project

cmake /path/to/SparqlEngineDraft/ -DCMAKE_BUILD_TYPE=Release; make -j

d) Run ctest. All tests should pass:

ctest
  1. Creating an Index:

IMPORTANT: THERE HAS TO BE SUFFICIENT DISK SPACE IN UNDER THE PATH USE CHOOSE FOR YOUR INDEX! FOR NOW - ALL FILES HAVE TO BE UTF8 ENCODED!

a) from an NTriples file (currently no blank nodes allowed):

./IndexBuilderMain -n /path/to/input.nt -b /path/to/myindex

b) from a TSV File (no spaces / tabs in spo):

./IndexBuilderMain -t /path/to/input.tsv -b /path/to/myindex

To include a text collection, the wordsfile (see below for the required format) has to be passed with -w. To support text snippest a docsfile (see below for the required format)has to be passed with -d

The full call will look like this:

./IndexBuilderMain -t /path/to/input.tsv -w /path/to/wordsfile -d /path/to/docsfile -b /path/to/myindex
  1. Starting a Sever:

a) Without text collection:

./ServerMain -i /path/to/myindex -p <PORT>

b) With text collection:

./ServerMain -i /path/to/myindex -p <PORT> -t
  1. Running queries:

curl 'http://localhost:<PORT>/&query=SELECT ?x WHERE {?x <rel> ?y}'

or visit:

http://localhost:<PORT>/index.html
  1. Text Features

5.1 Input Data

The following two input files are needed for full feature support:

a) Wordsfile

A file with TODO

b) Docsfile

A file with TODO

5.2 Supported Queries

Typical SPARQL queries can then be augmented. The features are best explained using examples:

A query for plants with edible leaves:

SELECT ?plant WHERE { 
    ?plant <is-a> <Plant> . 
    ?plant <in-context> ?c . 
    ?c <in-context> edible leaves
} 

The special relation :in-context to state that results for ?plant have to occur in a context ?c. In contexts matching ?c, there also have to be oth words edible and leaves.

A query for Astronauts who walked on the moon:

SELECT ?a TEXT(?c) SCORE(?c) WHERE {
    ?a <is-a> <Astronaut> . 
    ?a <in-context> walk* moon
} ORDER BY DESC(SCORE(?c))
TEXTLIMIT 2

Note the following features:

  • A star * can be used to search for a prefix as done in the keyword walk*. Note that there is a min prefix size depending on settingsat index build-time.
  • SCORE can be used to obtain the score of a text match. This is important to acieve a good ordering in the result. The typical way would be to ORDER BY DESC(SCORE(?c)).
  • Where ?c just matches a context Id, TEXT(?c) can be used to extract a snippet.
  • TEXTLIMIT can be used to control the number of result lines per text match. The default is 1.

An alternative query for astronauts who walked on the moon:

SELECT ?a TEXT(?c) SCORE(?c) WHERE {
    ?a <is-a> <Astronaut> . 
    ?a <in-context> walk* <Moon> 
} ORDER BY DESC(SCORE(?c))

This query doesn't search for an occurrence of the word moon but played where the entity <Moon> has been linked.

Text / Knowledge-base data can be nested in queries. This allows queries like one for politicians that were friends with a scientist associated with the manhattan project:

SELECT ?p TEXT(?c) ?s TEXT(?c2) WHERE
    ?p <is-a> <Politician> .
    ?p <in-context> ?c .
    ?c <in-context> friend*
    ?c <in-context> ?s .
    ?s <is-a> <Scientist> .
    ?s <in-context> ?c2 .
    ?c2 <in-context> manhattan project 
} ORDER BY DESC(SCORE(?c))

In addition to <in-context> there is another special relation <has-context> that is useful when search for documents.

A query for documents that state that a plan has edible leaves:

SELECT ?doc ?plant WHERE { 
    ?doc <has-context> ?c
    ?plant <is-a> <Plant> . 
    ?plant <in-context> ?c . 
    ?c <in-context> edible leaves
}    

Again, features can be nested.

A query for books with descriptions that contain the word drug.

SELECT ?book TEXT(?c) WHERE {
    ?book <is-a> <Book> .
    ?book <description ?d .
    ?d <has-context> ?c .
    ?c <in-context> drug
}

Note the use the the relation has-context that links the context to a text source (in this case the description) that may be an entity itself.

For now, each context is required to have a triple <in-context> ENTITY/WORD. Pure connections to variables (e.g. "Books with a description that mentions a plant.") are planned for the future.

How to obtain data to play around with

About

Sparql Engine -> Basic (no full language support) but highly efficient SPARQL engine. To be extended to support Semantic Full-Text Search (see http://broccoli.cs.uni-freiburg.de/demos/BroccoliFreebase/) in a standard SPARQL-like fashion. That follow up extension will feature a proper license and encourage collaboration eventually. If you're inte…

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • C++ 96.0%
  • Python 1.4%
  • CMake 1.3%
  • JavaScript 0.6%
  • CSS 0.3%
  • HTML 0.2%
  • C 0.2%