Monday, September 9, 2013

hadoop on cloudera quickstart vm test example 01 wordcount

It's not easy to install hadoop and related items like hdfs hive and so on of your own, and it is more difficult to config them after installation.

Thanks to cloudera, we can test hadoop with its integrated tool kit (Cloudera QuickStart VM). it provides vmware, kvm and virtualbox edition to download. Everything is configured and you can test without any difficulty.

in this video, I show a example hot to do wordcount in hadoop. The youtube link is:

Steps not included in the video:
1: download vmware player or virtualbox and install;
2: download Cloudera QuickStart VM from cloudera(the link may change all the time, so you can google keyword "Cloudera QuickStart VM" to download).

The example of the wordcount test is:

1: install wget on centos server
sudo yum -y install wget
2: create test dir in /home/cloudera
mkdir /home/cloudera/test
cd /home/cloudera/test
3: create test txt file
echo "what can I do with hadoop on hadoop server or hive server" > test1.txt
4: put the txt file to hdfs
hdfs dfs -mkdir /user/cloudera/input
hdfs dfs -put /home/cloudera/test/test1.txt /user/cloudera/input/
5: go to  /usr/lib/hadoop-mapreduce/
cd /usr/lib/hadoop-mapreduce/
6: run the job
hadoop jar hadoop-mapreduce-examples.jar wordcount /user/cloudera/input/test1.txt /user/cloudera/output
7: check what are there in the output
hdfs dfs -ls /user/cloudera/output/
8: reat the output file
hdfs dfs -cat /user/cloudera/output/part-r-00000