Udacity-Intro to Hadoop and MapReduce-Part 1
Class website: https://classroom.udacity.com/courses/ud617. Cited from the homepage:
The The Apache™ Hadoop® project develops open-source software for reliable, scalable, distributed computing. Learn the fundamental principles behind it, and how you can use its power to make sense of your Big Data.
Experiment resesults:
| Quiz | Reults |
|—————————-|————————-|
|Sales per Category | Toys: 57463477.11, Consumer Electronics: 57452374.13 |
| Highest Sale |Reno: 499.99, Toledo: 499.98, Chandler: 499.98|
| Total Sales | Number of Sales: 4138476, Total Vale of Sales: 1034457953.26|
Experiment results:
| Quiz | Reults |
|—————————-|————————-|
| Hits to Page | /assets/js/the-associates.js: 2456 |
| Hits from IP | 10.99.99.186: 6 |
| Most Popular | File path: /assets/css/combined.css, Number of occurrences: 117352|
You can read instructions on how to download and run the virtual machines here https://docs.google.com/document/d/1v0zGBZ6EHap-Smsr3x3sGGpDW-54m82kDpPKC2M6uiY/pub.
Information on how to transfer files back and forth to the virtual machine can be found here https://docs.google.com/document/d/1MZ_rNxJhR4HCU1qJ2-w7xlk2MTHVqa9lnl_uj-zRkzk/pub.
For step-by-step instructions for how to load data into HDFS, please re-watch HDFS Demo https://classroom.udacity.com/courses/ud617/lessons/308873795/concepts/3095085570923. For a reminder of how to run a mapreduce job, please re-watch Simplifying Things https://classroom.udacity.com/courses/ud617/lessons/308873795/concepts/3093825960923.
hadoop fs -put access_log myinput
and using pipline to test
head -100 ../data/access_log > testfile
cat testfile | ./mapper.py | sort | ./reducer2.py
hs mapper.py reducer.py myinput myoutput
hadoop fs -cat myoutput/part-00000
and search for the answer to the Quiz questions using “grep” commend
hadoop fs -get myoutput/part-00000 mylocalfile.txt
grep "/assets/js/the-associates.js" mylocalfile.txt