Corey wk 3
January 26, 2023
Executive Summary
January 26, 2023

Predicting the Oscars

Description:Please download the Oscar_2000_2018.csv dataset provided.This dataset amounts to a total of 1,235 movies from 2000 to 2018, where each film has 100+ features including:It sports 20 categorical, 56 numeric, 42 items, and 1 DateTime field totaling 119 fields giving you plenty of details about various aspects of the past nominees and winners.The dataset is organized such that each record represents a unique movie identified by the field movie_id.The first 17 fields have to do with the metadata associated with each movie e.g., release_date, genre, synopsis, duration, metascore.Tasks:Part 1: EDA1. Using a scatterplot or a pair plot show the relationship between features “user_reviews” and “critic_reviews”. Find the Pearson’s correlation coefficient(r) between the 2 features.2. Plot the average “duration” per “certificate” feature. In other words, x-axis would be “certificate” and the y-axes would be the average duration.3. Plot a histogram for the “genre” feature. Note that the field “genre” needs to be split first to find the frequency for each individual genre type; “Comedy”, “Romance”, “Action” etc. (Hint: Functions like “strsplit” in R or “split” in Python can be used)Part 2: Model Building1. You are going to predict “Oscar_Best_Picture_won” feature; this will be your target variable. Remove all of the features which has the convention “Oscar_Best_XXX_won” except for the target variable “Oscar_Best_Picture_won”.2. Convert the target variable’s type to a numerical type by doing the transformation, “Yes” = 1, “No” = 0.3. Remove columns with high cardinality, i.e., for every column that has a unique value frequency of 70% or higher, remove them from the dataset.4. Perform a time split and create a training dataset spanning the period 2000-2017 and a test dataset for the movies released in 2018 – use “year” feature for the data split5. Create a tree-based model to predict the target “Oscar_Best_Picture_won”6. Use the model to predict the test dataset and find the maximum predicted valueOptional: Go back to the initial dataset and find the movie in 2018 that is associated with the maximum predicted value.

 
Do you need a similar assignment done for you from scratch? We have qualified writers to help you. We assure you an A+ quality paper that is free from plagiarism. Order now for an Amazing Discount!
Use Discount Code "Newclient" for a 15% Discount!

NB: We do not resell papers. Upon ordering, we do an original paper exclusively for you.