+-------------+ | How to Make | +-------------+
- Unpack the tarball Project2.tar: $ tar xvf Project2.tar
- Enter into the project directory: $ cd Project2/
- Build the project: $ make
Three executable files for socket version are generated after making: Project2/port-mapper/port-mapper Project2/server/server Project2/client/minigoogle
+----------+ | Register | +----------+
Server register services into prot-mapper table on Port-Mapper.
= Steps =
-
Start port mapper daemon on Port Mapper machine: $ cd port-mapper/ $ ./port-mapper
-
Start server daemon on Server machine(s): $ cd server/ $ ./server
-
Run register command on Server terminal (3 servers). It will print feedback message from Port-Mapper: (server)# register MapReduceLibrary 1 Congratulations, 2 services rigistered successfully!
-
List registerd services on Port-Mapper: (port-mapper)# list
+---------+ | Request | +---------+
Client request server info (IP and port) from Port Mapper. Client should provide program name, version number, and procedure name.
= Pre-steps =
-
Server has registered services into port mapper table, as Register Services step 2 or step 4 shows.
-
On client machine, enter into client directory: $ cd client/
= Steps =
- When the Client requesting, Port-Mapper will send back all the matched Server ip and port:
$ ./minigoogle request MapReduceLibrary 1 Index
$ ./minigoogle request MapReduceLibrary 1 Search
+----------+ | Indexing | +----------+
Minigoogle execute the services provided by Server. Client should provide program name, version number, procedure name, input directory, and output directory.
= Pre-steps =
-
Server has registered services into port mapper table, as Register Services step 2 or step 4 shows.
-
On client machine, enter into client directory: $ cd client/
-
There are some text file to be indexing in the input directory (../data): $ ls ../data/ -lsh
= Steps =
-
When the Client executing, it print out the file operation information of each steps: $ ./minigoogle execute MapReduceLibrary 1 Index ../data ../index
-
The index task will be delivered to three servers separately. 2.1 Check the message on server1. 2.2 Check the message on server2. 2.3 Check the message on server3.
-
Check the output directory (../index) on client side: $ ls -lsh ../index/
+-----------+ | Searching | +-----------+
Minigoogle execute the services provided by Server. Client should provide program name, version number, procedure name, input directory, output directory, and terms to search.
= Pre-steps =
-
Index has done.
-
On client machine, enter into client directory: $ cd client/
= Steps =
-
When the Client searching, it print out the file operation information of each steps: $ ./minigoogle execute MapReduceLibrary 1 Search ../index/ ../output "how are you doing"
-
The searching task distributed to different servers: 2.1 Check the message on server1. 2.2 Check the message on server2. 2.3 Check the message on server3.
-
Check the output on client: $ cat ../output/*
+-------------+ | How to Make | +-------------+
It's necessary to build on a Hadoop environment.
- Logon the Hadoop machine
- Enter into the project directory: $ cd Project2/Hadoop_version
- Build the project: $ make
A jar packet for Hadoop version is generated after making: Project2/Hadoop_version/MiniGoogle.jar
+----------+ | Indexing | +----------+
= Pre-steps = A Hadoop environment is setup and ready to use.
= Steps =
- Copy the data from local to hdfs: $ hadoop dfs -mkdir data $ hadoop dfs -copyFromLocal ../data/*txt data
- index the data: $ hadoop jar MiniGoogle.jar Indexing data index
- Check the output result: $ hadoop dfs -cat data/part-m-00000
+-----------+ | Searching | +-----------+
= Pre-steps = A Hadoop environment is setup and ready to use.
= Steps =
- Search the keyword "what a nice day" in the index directory: $ hadoop jar MiniGoogle.jar MiniGoogle/MiniGoogle index result what a nice day
- Check the search result: $ hadoop dfs -cat result/part-r-00000
This section compares the performance between the Socket version and the Hadoop version.
= Steps =
Socket version:
-
Defing TIMESTAMP in client/socket_execute.c and rebuild the project: #define TIMESTAMP 1
-
Run Indexing step 1 for 10 times, and record the detailed time (take real time for reference). $ for i in 0 1 2 3 4 5 6 7 8 9; do time ./minigoogle execute MapReduceLibrary 1 Index ../data ../index; done
-
Run Search step 1 for 10 times, and record the detailed time (take real time for reference). $ for i in 0 1 2 3 4 5 6 7 8 9; do time ./minigoogle execute MapReduceLibrary 1 Search ../index/ ../output "how are you doing"; done
Hadoop version:
-
Run Indexing step 2 for 10 times, and record the Map-Reduce Framework/CPU time spent (ms) (take real time for reference): $ for i in 0 1 2 3 4 5 6 7 8 9; do time hadoop jar MiniGoogle.jar Indexing data index; done
-
Run Searching step 1 for 10 times, and record the Map-Reduce Framework/CPU time spent (ms) (take real time for reference): $ for i in 0 1 2 3 4 5 6 7 8 9; do time hadoop jar MiniGoogle.jar MiniGoogle/MiniGoogle index result what a nice day; done
= Results =
The time spend result (in second) recorded as below:
Wordcount Sort(W+S) Index(W+S+I) Search
Socket 1.146928 2.305397 4.541908 0.164810 Socket 1.202855 2.336820 4.632236 0.165075 Socket 1.307164 2.435692 4.738004 0.170763 Socket 1.438589 2.605878 5.074086 0.168736 Socket 1.470344 2.612354 5.026402 0.165218 Socket 1.544247 2.677010 5.157250 0.165665 Socket 1.585199 2.707569 5.225632 0.166475 Socket 1.694932 2.848935 5.386795 0.171973 Socket 1.739059 2.924582 5.588107 0.168603 Socket 1.242837 2.380279 4.675156 0.175365 Hadoop (CPU time spent (s)) 18.840 4.120 Hadoop 19.690 4.340 Hadoop 18.910 3.770 Hadoop 19.540 3.830 Hadoop 18.010 3.790 Hadoop 19.530 3.720 Hadoop 17.340 3.730 Hadoop 17.150 3.630 Hadoop 19.610 3.490 Hadoop 21.460 6.540
http://wiki.apache.org/hadoop/ http://hadoop.apache.org/docs/r1.0.4/commands_manual.html