GitHub - adamanthil/DBCacheBenchmarking: A project for cache layout benchmarking

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 173 Commits
.gitignore		.gitignore
Application.cpp		Application.cpp
Attribute.cpp		Attribute.cpp
Attribute.h		Attribute.h
Benchmark.cpp		Benchmark.cpp
BooleanExpression.cpp		BooleanExpression.cpp
BooleanExpression.h		BooleanExpression.h
BufferManager.cpp		BufferManager.cpp
BufferManager.h		BufferManager.h
Clause.cpp		Clause.cpp
Clause.h		Clause.h
Column.cpp		Column.cpp
Column.h		Column.h
DataCreator.cpp		DataCreator.cpp
DataCreator.h		DataCreator.h
Database.cpp		Database.cpp
Database.h		Database.h
DiskPage.cpp		DiskPage.cpp
DiskPage.h		DiskPage.h
FileDescriptor.cpp		FileDescriptor.cpp
FileDescriptor.h		FileDescriptor.h
FileManager.cpp		FileManager.cpp
FileManager.h		FileManager.h
GenerateQueries.cpp		GenerateQueries.cpp
IRelationalOperator.cpp		IRelationalOperator.cpp
IRelationalOperator.h		IRelationalOperator.h
JoinOperator.cpp		JoinOperator.cpp
JoinOperator.h		JoinOperator.h
Makefile		Makefile
MaterializationLayout.cpp		MaterializationLayout.cpp
MaterializationLayout.h		MaterializationLayout.h
MemoryBlock.cpp		MemoryBlock.cpp
MemoryBlock.h		MemoryBlock.h
MergeJoin.cpp		MergeJoin.cpp
MergeJoin.h		MergeJoin.h
NestedBlockJoin.cpp		NestedBlockJoin.cpp
NestedBlockJoin.h		NestedBlockJoin.h
Operand.cpp		Operand.cpp
Operand.h		Operand.h
Operators.h		Operators.h
PageLayout.cpp		PageLayout.cpp
PageLayout.h		PageLayout.h
Parser.cpp		Parser.cpp
Parser.h		Parser.h
Partition.cpp		Partition.cpp
Partition.h		Partition.h
Projection.cpp		Projection.cpp
Projection.h		Projection.h
Query.cpp		Query.cpp
Query.h		Query.h
README		README
Scanner.cpp		Scanner.cpp
Scanner.h		Scanner.h
Schema.cpp		Schema.cpp
Schema.h		Schema.h
SequentialScan.cpp		SequentialScan.cpp
SequentialScan.h		SequentialScan.h
Settings.cpp		Settings.cpp
Settings.h		Settings.h
Table.cpp		Table.cpp
Table.h		Table.h
Tuple.cpp		Tuple.cpp
Tuple.h		Tuple.h
TupleStreamReader.cpp		TupleStreamReader.cpp
TupleStreamReader.h		TupleStreamReader.h
TupleStreamWriter.cpp		TupleStreamWriter.cpp
TupleStreamWriter.h		TupleStreamWriter.h
automate_benchmark.sh		automate_benchmark.sh
config		config
configP		configP
configPAX		configPAX
createdb		createdb
db.xml		db.xml
globals.h		globals.h
syscatalog.h		syscatalog.h

Repository files navigation

Using the code:


The directory all ready contains the file and partitions used by our experiments.

To compile, simply type:
$> make

This will generate 3 executables:
dblite - interactive SM database application
genqueries - generates synthetic query workload
benchmark - simulates a multi-user environment on workload queries

====================================================
Running dblite 
====================================================

To execute dblite type:
$> dblite  

This will take you into interactive mode.  Here you can create the database files, initialize the database with various configurations, and query the data

Creating the Database (at the dblite prompt):
dblite> create createdb

To load an existing database (at dblite prompt):
dblite> load config db.xml

There are three configurations available:
Filename	Description
========	============
config		n-ary 
configP		SM
configPAX	PAX

To Execute a Query:

dblite> query [query]

query syntax
=====================================================
projection(child; projection-list)
merge-join(left-child, right-child; join-clause; ) \\ where left child and right child are sscans
sscan(table[:alias]; projection-list; [where-clause])

*projection-list ::= comma separated qualified attribute list.
*example(s):

dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.a < 5), sscan(T2:S; S.a, S.b; ); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z'), sscan(T2:S; S.a, S.b; S.b > 350 & S.b <= 1232); S.b = R.a; ); R.a, R.b, S.b, R.g, R.c)
dblite> query projection(merge-join(sscan(T1:R; R.a, R.b, R.g, R.c; R.g = 'Z' & R.a > 200), sscan(T2:S; S.a, S.b; S.b > 850 & S.b <= 1232); S.b = R.a; ); R.a, R.c)
                                                                                                                                                       
In the above examples, the names for the tables and the attributes are shown in the schema and can be found in "describe".  The names are determined in the db.xml file.

Type help for a full list of commands:

command       options                      description
=======       =======                      ===========
help                                       display usage/help
query         <query>                      execute query, results are returned to stdout
profile       [count] <query>              profiles the query
load          <partition-config> <schema>  loads the database
create        <synthetic-info>             creates a populated synthetic database
tables                                     list the tables in the database
describe      <table-name>                 list the table schema for the selected table
layout        ?|f|p                        gets/sets the current materialization
              ? - get current layout
              f - singl partition
              p - 2-partitons
quit                                       exits the program but why would you?

Known Issues
=====================================================
The parser is very brittle if incorrect syntax encountered it segfaults


Running with Cachegrind
=====================================================

In order to run with the simulated cache, call the following, 

valgrind --tool=cachegrind dblite

After loading the data and running a query, quit out of dblite, and the results will be shown. You can also simply load the data and not run a query before quitting to see to total overhead.


====================================================
Running genqueries 
====================================================
To execute genqueries type:
$> genqueries num-non-optimized nQ0 [nQ0 [nQ1 [nQ2 [nQ3 [nQ4 [nQ5 [nQ6 [nQ7 [nQ8 [nQ9 [nQ10 [nQ11 [nQ12 [nQ13 [nQ14]]]]]]]]]]]]]]]

parameters		description
num-non-optimized	number of non cache optimized queries to generate
nQX			number of queries to generate for query template X

====================================================
Running benchmark 
====================================================
To execute benchmark type:
$> benchmark config-file is-enabled nthreads < query-workload

parameters	description
config-file	same as dblite config file (specifies partition initialization)
is-enabled	enable or disable partition propagation (0 or 1)
nthreads	number of threads
query-workload	synthetic workload generated by genqueries

====================================================
Creating the data and the partitions
=====================================================

In order to create an auto generated file, make sure there is a folder called "Data" within the DBCacheBenchmarking page.

Open a file in the DBCacheBenchmarking folder called "createdb".

Then, for each relation you want generated, put the following
<filename>
<#fields>|<#records>|<#bytes per record|

Then for each field put one of the following types:

int|incr|<byte offset>
int|randIncr|<byte offset>
int|range|<lower bound>|<upper bound>|<lower missing bound>|<upper missing bound>|<byte offset>
int|oddRange|<lower bound>|<upper bound>|<byte offset>
string|<length>|<byte offset>
fK|<lower bound>|<upper bounde>|<lower missing bound>|<upper missing bound>|<byte offset>

int|incr means that it is an integer type that increments for each record created
int|randIncr creates an integer field that increments itself by a small random value for each record
int|range creates a random integer that is in the range of <lower bound> to <lower missing bound> and <upper missing bound> to <upper bound>
int|oddRange creates a random odd integer in the range of <lower bound> to <upper bound>
int|evenRange creates a random odd integer in the range of <lower bound> to <upper bound>
string creates a string with <length> characters
fK| gives unique random values in the given range in sorted order.
<byte offset> is the location of the field in the tuple

An example file is:

testTable1.tab
3|200|34
int|incr|0
string|26|4
int|range|0|55|0|0|30
testTable2.tab
4|1000|13
int|randIncr|0
int|evenRange|0|50|4
string|1|8
fK|55|100|70|76|9

This creates two binary data files "testTable1.tab" and "testTable2.tab" located in the "Data" folder.  The first table has 200 records, 3 fields, and a record is 34 bytes long.  The first field is an incremented integer, the second a string of 26 characters, and the third a randome integer from 0 to 55.  The second table has 1000 records, 4 fields, and 13 bytes per record.  The first field is a randomly incremented integer, the second a randome even integer from 0 to 50, the third and character, the fourth a unique sorted random integer from 55 to 70 and 76 to 100.

Another file must also be modified, called db.xml.  db.xml for the tables above looks like the following:

<database>
  <tables>
    <table id = "0" name = "T1" path="Data/"testTable1.tab">
      <schema>
        <attribute id="0" name="a" type="INTEGER" length="4" />
        <attribute id="1" name="b" type="STRING" length="26" />
        <attribute id="2" name="c" type="INTEGER" length="4" />
      </schema>
    </table>
    <table id = "1" name = "T2" path="Data/testTable2.tab">
      <schema>
        <attribute id="0" name="a" type="INTEGER" length="4" />
        <attribute id="1" name="b" type="INTEGER" length="4" />
        <attribute id="2" name="c" type="CHAR" length="1" />
        <attribute id="3" name="d" type="INTEGER" length="4" />
      </schema>
    </table>
  </tables>
</database>


Now, when running dblite, call create to create the new tables.  The program will end after the tables are created and will need to be restarted, but the new tables will be in place.


Creating Partitions:
In order to partition the tables, a config file must be created (though the name can be anything).

The first line must be 4096 (this is the page size, it should be variable, but 4096 is hard coded in other places, so it needs to be that value.

For each relation, do the following:
  <fileLocation>
  <#Partitions>|<#Fields>|<#records>|<#bytesPerRecord>
  
  for each partition, do the following:
    <#fieldsInPartition>|<#sizeOfPartitionTuple>

    for each field in the partition, do the following:
      <#fieldNum>|<fieldSize>


So, one config file for the above relations could be:
4096
Data/testTable1.tab
3|3|200|34
1|4
0|4
1|26
1|26
1|4
2|4
Data/testTable2.tab
2|4|1000|13
2|8
0|4
3|4
2|5
1|4
2|1

This splits the first relation in three partitions, each containing one field.  The first partition has size 4, the second has size 26, and the third has size 4 again.  The second relation is in 2 partitions, the first of which has the two fields, field 0 and field 3, and has a total tuple size of 8.  The second contains fields 1 and 2.  It should be noted that within a partition, the fields should be ordered from lowest to highest field id.

Another might be:
4096
Data/testTable1.tab
1|3|200|34
3|34
0|4
1|26
2|4
Data/testTable2.tab
2|4|1000|13
3|12
0|4
1|4
3|4
1|1
2|1

This puts the first relation in one partition (so in nsm format).  The second relation has two partitions, with 3 fields in on and 1 field in the other.

About

A project for cache layout benchmarking

Readme

Activity

3 stars

4 watching

0 forks

Report repository

Releases

1 tags

Packages

No packages published

adamanthil/DBCacheBenchmarking

Folders and files

Latest commit

History