Assignment Detail:- BUS5WB Data Warehousing and Big Data Assignment The third assignment focuses on Big Data analytics on unstructured text data using Microsoft Azure- You are required to derive insights by applying big data distributed processing and machine learning techniques- Dataset 1 - Amazon Reviews The dataset contains ~10000 reviews of Amazon products- The fields are; What you are required to do 1 HD Insight to Analyse ReviewsDevelop an aggregate of these reviews using your knowledge of Hadoop and MapReduce in Microsoft HDInsight- a- Follow the same approach as the Big Data Analytics Workshop -using the wordcount method in HDInsight- to determine the contributory words for each level of rating-b- Present the workflow of using HDInsight -you may use screen captures- along with a summary of findings and any insights for each level of rating- MapReduce documentation for HDInsight is available here- You may either create your own Hadoop Cluster or make use of the one provided to run your analysis- The details of the cluster will be provided on the LMS under the section for Assignment 3- 2 Azure Databricks for Big Data ProcessingUse the period of data allocated -it will be a single year- to you on the New York City Taxi & Limousine Commission dataset on Azure Databrick to answer the questions below; a- Plot a visual to show by month for the total fare amount generated by taxi trips with 4 or less passengers have been paid for by credit card- -You will have 12 records- b- Plot a visual to show the average cost per mile of a taxi ride in each month of the year assigned to you that travelled more than 5 miles, but less than 20 miles grouped by whether the trip was to the airport- -You will have 24 records-c- Plot a visual to show the day of the week the average number of taxi trips with a single passenger???? -You will have 7 records-d- What are top 10 most profitable routes -in terms of source and destination- for a taxi???? -You will have 10 records- For each of the questions above provide;• A screenshot of the visual• A table of the values• The code that you used to generate itYou will make use of the Azure Databrick cluster which is allocated to you- The details of the cluster will be provided on the LMS under the section Assignment 3- The year allocated to you for analysis will also be shared with you on the LMS- 3 Azure Machine Learning for Prediction Based on the year assigned to you in the New York City Taxi Dataset -as given in question 2 above- use Azure ML Studio to build a model that predicts the total ride duration of taxi trips in New York City- Provide the following:a- A screen capture of the completed model diagram and any decision you made in training the model- For example, rationale for some of the components used, how many records have been used for training and how many for testing-b- A set of metrics which presents how effective your model is-c- Which features were most influential in driving your model????d- Using your model predict the total trip duration for trips given below- You will make use of the Azure Machine Learning Studio that has been allocated to you- Information regarding accessing the application can be found in the LMS under the section Assignment 3- The datasets which are required for training and testing are available in Azure Machine Learning Studio further information has been provided in the LMS under section Assignment 3-

