Analysis of YouTube Data using Hadoop Mapreduce framework in Java.
It analyse YouTube data and gives most popular genres on YouTube based on views and uploads.
GBvideos.csv (Dataset)
YouTube Data Analysis (Implementation MapReduce model to find the most popular genre on YouTube based on uploads)
Top Viewed Categories (Implementation MapReduce model to find the most popular genre on YouTube based on views)
Top Categories Output (Output files)
The output is obtained by creating a .jar
file using the following lines of code on Linux terminal
Make an input directory in Hadoop filesystem:
hdfs dfs -mkdir /YouTubeInput
Put input data from Linux filesystem to Hadoop DFS:
hdfs dfs -put /Downloads/YouTubeDataAnalysis/GBvideos.csv /YouTubeInput
Create and execute a jar file and save results in ouptut directory in hdfs:
hadoop jar /home/hadoop/TopViewedCategories.jar TopCategoryDriver /YouTubeInput /YouTubeOutput
To view results:
hdfs dfs -cat /YouTubeOutput/*
Get results from Hadoop DFS to Linux filesystem:
hdfs dfs -get /YouTubeOutput/* /Downloads/YouTubeAnalysis/TopCategoryOutput