COSC 2670 Practical Data Science with Python Assignment

Assignment Detail:- COSC 2670 - Practical Data Science with Python - RMIT University Introduction In this assignment, you will examine a data file and carry out the first steps of the data science process, including the cleaning and exploring of data- You will need to develop and implement appropriate steps, in IPython, to load a data file into memory, clean, process, and analyse it-This assignment is intended to give you practical experience with the typical first steps of the data science process- The "Practical Data Science" Canvas contains further announcements and a discus- sion board for this assignment- Please be sure to check these on a regular basis - it is your responsibility to stay informed with regards to any announcements or changes- Part 1: Data PreparationHave a look at the file StarWars-csv, which is available in Canvas under the Assignments-> Assignment 1 section of the course Canvas-This file contains data behind the story America's Favorite ‘Star Wars' Movies -And Least Favorite Characters-1- The author collected the data by running a poll through SurveyMonkey Audience, surveying 1,186 respondents- The description of the questions asked in the survey is given below- • Have you seen any of the 6 films in the Star Wars franchise????• Do you consider yourself to be a fan of the Star Wars film franchise????Which of the following Star Wars films have you seen???? Please select all that apply- -Star Wars: Episode I The Phantom Menace; Star Wars: Episode II Attack of the Clones; Star Wars: Episode III Revenge of the Sith; Star Wars: Episode IV A New Hope; Star Wars: Episode V The Empire Strikes Back; Star Wars: Episode VI Return of the Jedi- Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film- -Star Wars: Episode I The Phantom Menace; Star Wars: Episode II Attack of the Clones; Star Wars: Episode III Revenge of the Sith; Star Wars: Episode IV A New Hope; Star Wars: Episode V The Empire Strikes Back; Star Wars: Episode VI Return of the Jedi- Please state whether you view the following characters favorably, unfavorably, or are unfamiliar with him/her- -Han Solo, Luke Skywalker, Princess Leia Organa, Anakin Skywalker, Obi Wan Kenobi, Emperor Palpatine, Darth Vader, Lando Calrissian, Boba Fett, C-3P0, R2-D2, Jar Jar Binks, Padme Amidala, Yoda-• Which character shot first????• Are you familiar with the Expanded Universe????• Do you consider yourself to be a fan of the Expanded Universe????• Do you consider yourself to be a fan of the Star Trek franchise????• Gender • Age • Household Income• Education• Location -Census Region-Being a careful data scientist, you know that it is vital to carefully check any available data before starting to analyse it- Your Part is to prepare the provided data for analysis- You will start by loading the CSV data from the file -using appropriate pandas functions- and checking whether the loaded data is equivalent to the data in the source CSV file- Then, you need to clean the data by using the knowledge we taught in the lectures- You need to deal with all the potential issues/errors in the data appropriately- Part 2: Data Exploration Explore the provided data based on the following steps: 1- Explore the survey question: Please rank the Star Wars films in order of preference with 1 being your favorite film in the franchise and 6 being your least favorite film- -Star Wars: Episode I The Phantom Menace; Star Wars: Episode II Attack of the Clones; Star Wars: Episode III Revenge of the Sith; Star Wars: Episode IV A New Hope; Star Wars: Episode V The Empire Strikes Back; Star Wars: Episode VI Return of the Jedi-, then analysis how people rate Star Wars Movies-2- Explore the relationships between columns- You need to choose 3 pairs of columns to focus on, and you need to generate 1 visualisation for each pair- Each pair of columns that you choose should address a plausible hypothesis for the data concerned-3- Explore whether there are relationship between people's demographics -Gender, Age, Household Income, Education, Location- and their attitude to Start War characters- Note, each visualization -graph- shoul be complete and informative in itself, and should be clear for readers to read and obtain information- Part 3: ReportWrite your report and save it in a file called report-pdf, and it must be in PDF format, and must be at most 6 -in single column format- pages -including figures and references- with a font size between 10 and 12 points- Penalties will apply if the report does not satisfy the requirement- Moreover, the quality of the report will be considered, e-g- clarity, grammar mistakes, the flow of the presentation-Remember to clearly cite any sources -including books, research papers, course notes, etc-- that you referred to while designing aspects of your programs-• Create a heading called "Data Preparation" in your report-- Provide a brief explanation of how you addressed the Part- For the steps of dealing with the potential issues/errors, please create a sub-section for each type of errors you dealt with -e-g- typos, extra whitespaces, sanity checks for impossible values, and missing values etc-, and also explain and justify how you dealt with each kind of errors- • Create a heading called "Data Exploration" in your report-- For each numbered step in Part 2 above, create a sub-section with correspond- ing numbering- Attachment:- Practical Data Science with Python-rar
solvedassignments.net Rated 4.8 / 5 based on 22789 reviews.
Captcha

Helping Students of Australia/New Zealand, GET Help with Classroom Assignments? Order Now