Monday, September 9, 2013

hadoop on cloudera quickstart vm test example 01 wordcount

It's not easy to install hadoop and related items like hdfs hive and so on of your own, and it is more difficult to config them after installation.

Thanks to cloudera, we can test hadoop with its integrated tool kit (Cloudera QuickStart VM). it provides vmware, kvm and virtualbox edition to download. Everything is configured and you can test without any difficulty.

in this video, I show a example hot to do wordcount in hadoop. The youtube link is:

Steps not included in the video:
1: download vmware player or virtualbox and install;
2: download Cloudera QuickStart VM from cloudera(the link may change all the time, so you can google keyword "Cloudera QuickStart VM" to download).

The example of the wordcount test is:

1: install wget on centos server
sudo yum -y install wget
2: create test dir in /home/cloudera
mkdir /home/cloudera/test
cd /home/cloudera/test
3: create test txt file
echo "what can I do with hadoop on hadoop server or hive server" > test1.txt
4: put the txt file to hdfs
hdfs dfs -mkdir /user/cloudera/input
hdfs dfs -put /home/cloudera/test/test1.txt /user/cloudera/input/
5: go to  /usr/lib/hadoop-mapreduce/
cd /usr/lib/hadoop-mapreduce/
6: run the job
hadoop jar hadoop-mapreduce-examples.jar wordcount /user/cloudera/input/test1.txt /user/cloudera/output
7: check what are there in the output
hdfs dfs -ls /user/cloudera/output/
8: reat the output file
hdfs dfs -cat /user/cloudera/output/part-r-00000


  1. This is one of the most incredible blogs on hadoop. Ive read in a very long time. The amount of information in here is stunning, like you practically wrote the book on the subject. Your blog is great for anyone who wants to understand this subject more. Great stuff; please keep it up!
    Hadoop Training in hyderabad

  2. Actually, you have explained the technology to the fullest. Thanks for sharing the information you have got. It helped me a lot. I experimented your thoughts in my training program.

    Hadoop Training Chennai
    Hadoop Training in Chennai
    Big Data Training in Chennai

  3. Cloud is one of the tremendous technology that any company in this world would rely on(cloud computing training chennai). Using this technology many tough tasks can be accomplished easily in no time. Your content are also explaining the same(Cloud computing training centers in chennai). Thanks for sharing this in here. You are running a great blog, keep up this good work.

  4. Well Said. The content provided is true up to my knowledge. This made me to understand the concepts very clear. Thanks for sharing this wonderful information in here. Keep blogging article like this. I have bookmarked this page for future reference as well.

    Hadoop Training Chennai | Big Data Training | JAVA Course in Chennai

  5. Thank you a lot for providing individuals with a very spectacular possibility to read critical reviews from this site.

    Best Java Training Institute Chennai