Skip to content


Folders and files

Last commit message
Last commit date

Latest commit



23 Commits

Repository files navigation



#import the source code of hadoop to eclipse 1.install maven
use "mvn -v" to check
2.install protobuf
use "protoc --version" to check
3.something need to be done
add to hadoop-common/src/test/java/org/apache/hadoop/io/serializer/avro/
add, to hadoop-common/src/test/java/org/apache/hadoop/ipc/protobuf/
cd hadoop-2.6.0-src/hadoop-maven-plugins/
mvn install
cd hadoop-2.6.0-src/
mvn eclipse:eclipse -DskipTests
in "hadoop-streaming" project build path
rebuild the source link "hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf", remove the original one.

#import the source code of tachyon to eclipse tachyon-0.6.4
2.mvn eclipse:eclipse -DskipTests
3.import and fix problems.

#use maven to compile java 1.7
mvn package -Pdist -DskipTests -Dtar
java 1.8
mvn package -Pdist -DskipTests -Dtar -Dadditionalparam=-Xdoclint:none
copy the *.jar from target/ in source code to share/ which used by hadoop.

#what we have done now 1.put the data submitted by the client to HDFS in a certain datanode.
related class and method:
NameNodeRpcServer:addBlock()------return type:LocatedBlock
FSNamesystem:getAdditionalBlock()------return type:LocatedBlock
BlockManager:chooseTarget4NewBlock()------return type:DatanodeStorageInfo[]
BlockPlacementPolicyDefault:chooseTarget()------return type:DatanodeStorageInfo[]
Host2NodeMap:getDatanodeByHost()------return type:DatanodeDescriptor
Host2NodeMap:getDatanodeByXferAddr()------return type:DatanodeDescriptor
DatanodeDescriptor:getStorageInfos()------return type:DatanodeStorageInfo[]
2.use SSD, MEMORY as a datanode. some tachyon experiments. 4.put the data submitted by the client to HDFS in some datanodes according to the proportion.
1>make a proportion
static int count[] = { 8, 1, 1 };
2>modify the chooseTarget method
public DatanodeStorageInfo[] inmemTarget(String srcPath, int numOfReplicas,
Node writer, List chosenNodes,
boolean returnChosenNodes, Set excludedNodes, long blocksize,
final BlockStoragePolicy storagePolicy) {

	if (count[0] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1];</br> 
		testTarget[0] = testTarget1[0];</br>
		return testTarget;</br>
	} else if (count[1] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
	DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1];</br> 
		testTarget[0] = testTarget1[0];</br>
		return testTarget;</br>
	} else if (count[2] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1]; </br>
		testTarget[0] = testTarget1[0];</br>
		return testTarget;</br>
else {</br>
		count[0] = 8;</br>
		count[1] = 1;</br>
		count[2] = 1;</br>
		// get datanode by ip address</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
		// get datanode by (ip, port)</br>
		// this.host2datanodeMap.getDatanodeByXferAddr("",59010);</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
  	DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1]; </br>
		testTarget[0] = testTarget1[0];</br>
		return testTarget;</br>

5.modify relax_locality to false in yarn_protos.proto to implement data locality. set rack parameters "" in by using "".
7.two corresponding relations: containers and maptasks, containers and datanodes.
related class and method: RMContainerAllocator:addMap()------return type:void
RMContainerAllocator:assignMapsWithLocality()------return type:void

#some commands in hadoop hadoop namenode -format
hadoop fs -put file(s) file(d)
hadoop fs -rm [-r] file
hadoop fs -rm hdfs://node1(which namenode is located on):9000/*
hadoop dfsadmin -safemode leave
hadoop dfsadmin -report

#compile a mapreduce program bin/hadoop *.java
jar cf classname.jar *.class

#what we are going to do 1.the impact of vcore and vmemory. some benchmarks to show the different performance about memory and disk.

3.put all containers in the same node.