项目作者: lukaselmer

项目描述 :
ETH Data Mining Class
高级语言: Python
项目地址: git://github.com/lukaselmer/ethz-data-mining.git
创建时间: 2014-02-28T19:51:14Z
项目社区:https://github.com/lukaselmer/ethz-data-mining

开源协议:MIT License

下载


Data Mining

Important URLs

Usage

  1. # Start and stop Hadoop
  2. /usr/local/Cellar/hadoop121/1.2.1/bin/start-all.sh
  3. /usr/local/Cellar/hadoop121/1.2.1/bin/stop-all.sh
  4. # Hadoop dir
  5. /usr/local/Cellar/hadoop121/1.2.1
  6. # Copy data to HDFS
  7. hadoop dfs -copyFromLocal /Users/lukas/data-mining/example/input /user/hduser/example
  8. # Run the job
  9. # Mapper and reducer paths are local, input and output paths are HDFS
  10. hadoop jar ~/.bin/hadoop-streaming-1.2.1.jar \
  11. -mapper /Users/lukas/data-mining/example/mapper.py \
  12. -reducer /Users/lukas/data-mining/example/reducer.py \
  13. -input "/user/hduser/example/*" \
  14. -output /user/hduser/example-output
  15. # List and output the results
  16. hadoop dfs -ls /user/hduser/example-output
  17. hadoop dfs -cat /user/hduser/example-output/part-00000
  18. # Copy data to local dir
  19. hadoop dfs -copyToLocal /user/hduser/example-output /Users/lukas/data-mining/example/output
  20. # Delete dir
  21. hadoop dfs -rmr /user/hduser/example-output