Skip to content

Cardenio/HDFS_CQU

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

23 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HDFS_CQU

...

#import the source code of hadoop to eclipse 1.install maven
use "mvn -v" to check
2.install protobuf
use "protoc --version" to check
3.something need to be done
add TestAvroSerialization.java to hadoop-common/src/test/java/org/apache/hadoop/io/serializer/avro/
add TestProtos.java, TestRpcServiceProtos.java to hadoop-common/src/test/java/org/apache/hadoop/ipc/protobuf/
cd hadoop-2.6.0-src/hadoop-maven-plugins/
mvn install
cd hadoop-2.6.0-src/
mvn eclipse:eclipse -DskipTests
4.import
in "hadoop-streaming" project build path
rebuild the source link "hadoop-2.6.0-src/hadoop-yarn-project/hadoop-yarn/hadoop-yarn-server/hadoop-yarn-server-resourcemanager/conf", remove the original one.

#import the source code of tachyon to eclipse 1.cd tachyon-0.6.4
2.mvn eclipse:eclipse -DskipTests
3.import and fix problems.

#use maven to compile java 1.7
mvn package -Pdist -DskipTests -Dtar
java 1.8
mvn package -Pdist -DskipTests -Dtar -Dadditionalparam=-Xdoclint:none
copy the *.jar from target/ in source code to share/ which used by hadoop.

#what we have done now 1.put the data submitted by the client to HDFS in a certain datanode.
related class and method:
NameNodeRpcServer:addBlock()------return type:LocatedBlock
FSNamesystem:getAdditionalBlock()------return type:LocatedBlock
BlockManager:chooseTarget4NewBlock()------return type:DatanodeStorageInfo[]
BlockPlacementPolicyDefault:chooseTarget()------return type:DatanodeStorageInfo[]
Host2NodeMap:getDatanodeByHost()------return type:DatanodeDescriptor
Host2NodeMap:getDatanodeByXferAddr()------return type:DatanodeDescriptor
DatanodeDescriptor:getStorageInfos()------return type:DatanodeStorageInfo[]
2.use SSD, MEMORY as a datanode.
3.do some tachyon experiments. 4.put the data submitted by the client to HDFS in some datanodes according to the proportion.
1>make a proportion
static int count[] = { 8, 1, 1 };
2>modify the chooseTarget method
public DatanodeStorageInfo[] inmemTarget(String srcPath, int numOfReplicas,
Node writer, List chosenNodes,
boolean returnChosenNodes, Set excludedNodes, long blocksize,
final BlockStoragePolicy storagePolicy) {

	if (count[0] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
				.getDatanodeByHost("222.198.132.207");</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1];</br> 
		testTarget[0] = testTarget1[0];</br>
		count[0]--;</br>
		return testTarget;</br>
	} else if (count[1] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
				.getDatanodeByHost("222.198.132.210");</br>
	DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1];</br> 
		testTarget[0] = testTarget1[0];</br>
		count[1]--;</br>
		return testTarget;</br>
	} else if (count[2] != 0) {</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
				.getDatanodeByHost("222.198.132.208");</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
		DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1]; </br>
		testTarget[0] = testTarget1[0];</br>
		count[2]--;</br>
		return testTarget;</br>
	}</br>
else {</br>
		count[0] = 8;</br>
		count[1] = 1;</br>
		count[2] = 1;</br>
		// get datanode by ip address</br>
		DatanodeDescriptor Test1 = this.host2datanodeMap</br>
				.getDatanodeByHost("222.198.132.207");</br>
		// get datanode by (ip, port)</br>
		// this.host2datanodeMap.getDatanodeByXferAddr("172.31.8.147",59010);</br>
		DatanodeStorageInfo[] testTarget1 = Test1.getStorageInfos();</br>
  	DatanodeStorageInfo[] testTarget = new DatanodeStorageInfo[1]; </br>
		testTarget[0] = testTarget1[0];</br>
		count[0]--;</br>
		return testTarget;</br>
	}</br>
}</br>

5.modify relax_locality to false in yarn_protos.proto to implement data locality.
6.to set rack parameters "net.topology.script.file.name" in core.site.xml by using "RackAware.py".
7.two corresponding relations: containers and maptasks, containers and datanodes.
related class and method: RMContainerAllocator:addMap()------return type:void
RMContainerAllocator:assignMapsWithLocality()------return type:void

#some commands in hadoop hadoop namenode -format
hadoop fs -put file(s) file(d)
hadoop fs -rm [-r] file
hadoop fs -rm hdfs://node1(which namenode is located on):9000/*
hadoop dfsadmin -safemode leave
hadoop dfsadmin -report

#compile a mapreduce program bin/hadoop com.sun.tools.javac.Main *.java
jar cf classname.jar *.class

#what we are going to do 1.the impact of vcore and vmemory.
2.do some benchmarks to show the different performance about memory and disk.

3.put all containers in the same node.