GitHub - songchao43088/DBI

Branches Tags

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
Makefile		Makefile
OurDB.c		OurDB.c
README		README
example.acsv		example.acsv
hf.c		hf.c
hf.h		hf.h
lh.c		lh.c
lh.h		lh.h
parse.c		parse.c
parse.h		parse.h
test.acsv		test.acsv
test1.acsv		test1.acsv

Repository files navigation

README of project 2.1
W4112 Database Implementation
authors: Chao Song (UNI: cs2994) Rongxin Du (UNI: rd2537)
********************************************************************************
Files: 
"hf.h"	
	the header given in project2.1 website, we modify and fill in the blank   
"hf.c"	
	corresponding .c file that defines functions declared in "hf.h"
"OurDB.c"
	contains int main(int argc, char** argv), parse user input command lines and generate results
"parse.h"
	declare some useful functions that are called in OurDB.c
"parse.c"
	corresponding .c file that defines functions declared in "parse.h"
"Makefile"
	SOURCES=OurDB.c hf.c parse.c
	EXECUTABLE=OurDB
	using gcc to compile
	make clean: clean *.o *.hf, assume that the heap file is named as *.hf
"example.acsv"
	the testing example provided in project2.1 website
"test.acsv"
	the testing example written by us, testing fields:c1,c9,c100,i1,i2,i4,i8,r4,r8
****************************************************************************************************************************
Instructions on how to use our code in Linux:
Let's take the example.acsv provided on project2.1 website as an example.
Step 1: Make
$ make clean
$ make
Step 2: import the CSV file to a heap file
$ ./OurDB example.hf -i < example.acsv
Step 3: show the content of the heap file
$ ./OurDB example.hf
Step 4: writing queries
You can try to write queries like the following:
	Query(1) ./OurDB example.hf -s1 "=" Orson -p1 -p3
You can re-arrange the selection and projection order, the result will be the same, and by default we print out the results in the same column order as given by the schema. Note that the following query (2) gives the same result as query(1).
	Query(2) ./OurDB example.hf -p3 -s1 "=" Orson -p1
	Query(3) ./OurDB example.hf -s1 ">" H -s3 ">" 1957 -s4 "<=" 1.83 -p1 -p3 -p4
Note that you must use space to separate adjacent tokens.
You should input like the above queries, you can't input like this ./OurDB example.hf -s1">"H -p1-p2
Step 5: export query results to CSV file
$ ./OurDB example.hf -s3 ">=" 1950 -s4 ">=" 1.75 -p1 -p2 > example_result.acsv
***********************************************************************************************************************
With regard to the 7 read/write/compare field processing functions for different data types:
We declare the following 3 funtion pointers to 7 encode functions (write to heapfile), to 7 decode functions (read from heapfile),
and 7 compare functions (for field comparison), respectively. Please refer to "hf.c" to see their definitions, refer to "OurDB.c"
to see function mapping.
int (*encode[7])(void *record, char *line, int *offset, int length);
int (*decode[7])(void *record, char *line, int *offset, int length);
int (*compare[7])(void *record, int offset, void *value, int length);
*********************************************************************************************************************
Have a good day.  1:42 am, April.4    Signature: Chao, Rongxin
	

Change log of project 2.2
********************************************************************************
Files: 
"lh.h"	
	the header of all linear hashing related functions
"lh.c"	
	corresponding .c file that defines functions declared in "lh.h"
****************************************************************************************************************************
new commands supported by our program:
-b[n]	will build a hash index on nth column.
usage: 	./OurDB heapfile.hf -b1
		This will build a hash index on 1st column of heapfile named "heapfile.hf"
	./OurDB heapfile.hf -b1 -b2
		This will build two hash indices on 1sh and 2nd column within one scan of the heapfile.
	./OurDB heapfile.hf -i < example.csv
		This will insert new records from example.csv to heapfile.hf. This cammand is also supported in version 2.1. But now
		when inserting records, it will also updates all the hash indices associated with that heapfile. 
	./OurDB heapfile.hf -i -b1 < example.csv
		This command has the same meaning as first run "./OurDB heapfile.hf -b1" followed by "./OurDB heapfile.hf -i < example.csv"
		Note: -i must comes before any -b[n]
	./OurDB heapfile.hf -s1 "=" 1 -p1
		This selection and projection commands is also supported in version 2.1. But now when we are doing a equality comparison,
		and we have a hash index built on that column, then we will utilize that hash index to speed up the select.
	./OurDB heapfile.hf -b1 -s1 "=" 1 -p1
		This combination of building index and selection commands is interpreted as running "./OurDB heapfile.hf -b1" followed by
		"./OurDB heapfile.hf -s1 "=" 1 -p1"
****************************************************************************************************************************
Some implementation details of our hash index component.
Bucket size:
	Bucket size is defined using a preprocessing "#define" command. We set the size to be 120 bytes because the data type
	with maximum bytes is char100, which will need 100 bytes(for data) + 8(for RID) for one single key-rid pair. 
Hash functions:
	For real number, we cast its binary representation to a integer and then calculate its hash index.
	For string, we sum up its characters and use the sum to calculate its hash index.
****************************************************************************************************************************
Performance Discussion
We created a heapfile with 250,000 records. Then we tried to do a equality selection which is expected to return 50 records
before we build any index. This operation takes approximately 3.7 seconds. Then we build a hash index on the column to which 
we are applying this equality comparison. Then we run the same selection again. This time the result shows up instantly. This
shows that the hash index we build do help improve the performance.