Assignment Detail:- BUS5DWR Data Wrangling and R - La Trobe University
Overview
Assignment Requirements
Part 1The given data files Movie-csv, Rating-csv and Continent-csv record the information about the IMDB movie ratings-
Write R code in an Rmd file to answer the following questions- Each question should be presented in one code chunk:
Load the dataset from the given files into three data frames called Movie, Rating, and Continent- Rename columns to remove space if they exist- -Hint: use str_replace_all to do this automatically for all columns-- Remove the column Writer in the Movie dataframe- Display the summary of each dataframe-
How many movies produced by 'Universal Pictures' have the actor 'Arnold Schwarzenegger'????
Display the five most-reviewed movies that belong to both Action and Drama- Display only the Title and the number of reviews-
Display movie rating information including Title, average rating and two new columns -1- 'TotalVote' showing the total votes from both males and females and -2- 'Popular' showing 'Male' for movies with the MalesTotalVotes greater than FemalesTotalVotes and 'Female' otherwise- -Hint: see Workshop 9 exercise-- Show only TEN movies with the highest average rating-
Display the number of Comedy movies and their average rating from each continent-
Analyse the distribution of the average rating of all the movies after the year 2000- -Hint: draw a boxplot and histogram and write a short paragraph -less than 100 words- to describe your insight--
Part 2The given Spotify-xlsx file records the summary of Australia's top 200 daily-streamed songs -or tracks- in the first three months of 2017 and 2018- The Data worksheet records the total streams and the highest position of each song in each month- You will see that the data is far from being ready for analysis and needs to be 'wrangled'- The given Artist-csv file records the artists who perform the songs- You are required to write R code to perform the following steps-
Load the data from the Spotify worksheet into a dataframe named Spotify- Replace the space in the column name with an underscore -"_"-- Show the structure of Spotify-
You can see that most column names contain the month information, which should be placed as row values- Let:
• Use pivot_longer to transform the dataframe into four columns, namely Artist_ID, Track_Name, Month, and Value-• Drop all rows having NA in Value-• Split the Month column into Month and Year• Display the number of columns and rows-
You can see that the data in column Value contains both the total stream and highest position of the song in the corresponding month- Note that the smaller value of the position, the higher the position-
• Split the Value column into two columns with appropriate names-• For each month-year, show the total streams and the number of songs appearing in the daily top 200-
Find all tracks that appeared in all six months with each monthly stream more than 100,000- Display their name, total stream and highest position- Export the result into a CSV file-
Load the data from the Artist-csv file into a new dataframe- Rename the columns to remove spaces- How many artists do not have songs listed in the Spotify dataframe????
Draw a bar chart to compare the artists of the songs/tracks returned in Q2-4 based on their total stream- Order the bar from the highest to the lowest total stream- Write a small paragraph describing your insight got from this chart-
Attachment:- Data Wrangling and R Assignment-rar
Most Recent Questions