Films Proyect

That project built an algorithm capable of predicting scores based on the bias and behavior of users. It also used a linear model to solve day-to-day problems.

TARGETS

Score forecasts in recommendation systems are widely used nowadays to suggest purchases or any products or services. With the appropriate selection of predictors, we want to be able to understand the habits of user appreciation and sentiment towards a product. In this case movies

OUTCOMES

Two models were obtained using a tool capable of predicting scores. The algorithm used took into account regularization variables, which are vital for large datasets with millions of rows and dozens of columns.  

PROJECT DURATION

The project lasted for 3 weeks, starting with scraping the dataset from a public source, removing unnecessary information, and cleaning the data using regular expressions. The numbers were then analyzed, and the best model was selected based on the parameters of the error.

Through the process of clustering by genres, I discovered that there is a distinct group that exhibits a higher level of sensitivity to the model. This finding suggests that there may be certain characteristics or features of the model that appeal more strongly to this particular group. Further refinements and analysis of this group’s preferences and behaviors could provide valuable insights into how to improve the model’s overall effectiveness and appeal to a wider audience. Additionally, this discovery highlights the importance of understanding and catering to the needs and preferences of diverse user groups in the development and refinement of any model or product.

The model has proven to be accurate in predicting the scores of a significant proportion of movies that receive less than 1 point in reality, below the proportion of movies with the differences in the appreciation received from the users. 

Showed below one chunk of the process:

Refinement of the model yielded:

 

 

Adding a quotient of regularisation to our mathematical model, we “punish” those movies with a low amount of votes, in the same way, the users who don’t tend to score  movies often.