Assignment Detail:- ICT303 Big Data - Crown Institute of Higher Education
Assessment - Using MapReduce for processing big data
LO 1: Design appropriate repository structure for storing big data-
LO 2: Design big data solutions using Map-reduce techniques-
Instructions
The following file is from Movielens dataset which shows user ratings for movies:
You can find more about this attached dataset in file
u-data is the full u data set with 100000 ratings by 943 users on 1682 items- Each user has rated at least 20 movies- Users and items are numbered consecutively from 1- The data is randomly ordered- This is a tab separated list of user id | item id | rating | timestamp- The time stamps are unix seconds since 1/1/1970 UTC- For example, the following line of the file
95 546 2 879196566
Is interpreted as follows: User 95 has rated movie 546, 2/5 -rates are in the range 1-5- at time 879196566 -Monday, November 10, 1997 9:16:06 PM, GMT--
Your task is to use MapReduce programming and find the following information for each movie: the average rating and the number of users who rated this movie- Here is an example of the output:
Movie ID
Average Rating
Number of Users Rated
340
3-78
298
499
4-02
532
You can choose the output format- However, the required information must be included in the output-
Hint: You can change the WordCount program such that it ignores all tokens in a line except the third one -rating value in the file exists in the third column--
The program must also print the name of group members on the screen-
Deliverable
You need to submit an MS Word or a PDF file which includes the following items:
- The source code for map and reduce function -copied/pasted into the MS Word or PDF file; no separate file is needed--
- Enough screenshots on the steps taken to get the program running-
- Screenshots for the output generated by the program- The name of group members must be also part of the printed information- Annotate all screenshots with brief descriptions -one line or two is enough--
- A section for discussion on your experience with MapReduce programming- To solve the given problem, what other tools and techniques are available???? Compare MapReduce programming with the tools and techniques you mentioned- You can mention several factors like simplicity, scalability, reliability, etc-
Attachment:- MapReduce Programming-rar
Most Recent Questions