项目作者: SarahAyaz

项目描述 :
Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
高级语言: Java
项目地址: git://github.com/SarahAyaz/YouTube-Data-Analysis.git
创建时间: 2019-02-10T15:04:39Z
项目社区:https://github.com/SarahAyaz/YouTube-Data-Analysis

开源协议:

下载


YouTube Data Analysis

It analyse YouTube data and gives most popular genres on YouTube based on views and uploads.

Structure

  1. GBvideos.csv (Dataset)

  2. YouTube Data Analysis (Implementation MapReduce model to find the most popular genre on YouTube based on uploads)

  3. Top Viewed Categories (Implementation MapReduce model to find the most popular genre on YouTube based on views)

  4. Top Categories Output (Output files)

Reading Output file

The output is obtained by creating a .jar file using the following lines of code on Linux terminal

Steps

  1. Make an input directory in Hadoop filesystem:

    1. hdfs dfs -mkdir /YouTubeInput
  2. Put input data from Linux filesystem to Hadoop DFS:

    1. hdfs dfs -put /Downloads/YouTubeDataAnalysis/GBvideos.csv /YouTubeInput
  3. Create and execute a jar file and save results in ouptut directory in hdfs:

    1. hadoop jar /home/hadoop/TopViewedCategories.jar TopCategoryDriver /YouTubeInput /YouTubeOutput
  4. To view results:

    1. hdfs dfs -cat /YouTubeOutput/*
  5. Get results from Hadoop DFS to Linux filesystem:

    1. hdfs dfs -get /YouTubeOutput/* /Downloads/YouTubeAnalysis/TopCategoryOutput