Banner

Predicting AirBnB Review Scores: Report

Names:

Artur Rodrigues, arodrigues (at) ucsd (dot) edu
Doanh Nguyen, don012 (at) ucsd (dot) edu
Ryan Batubara, rbatubara (at) ucsd (dot) edu

A website version of this file is available here

Introduction

Since the end of COVID-19, people have been eager to go out of their homes after the long lockdown. Often hesitant to go to more public places like hotels, many may think of AirBnb when seeking more unique, personalized, and private getaway options. As AirBnb users, we wanted to take a deeper look into what makes an AirBnb listing special - in other words, why are some listings rated higher than others? By training a model that can predict a listing’s review scores, we will have a better understanding as customers to what we should look out for in our next booking. For hosts, such a model may help them improve existing listings or predict the popularity of new ones.

Thus, as our model serves a real purpose to both the customer and host, it is important that we make our predictions as accurate as possible while ensuring a large variety of listings and reviews are well-represented. In order to do this, we must first explore the data at hand.

Banner

Hyperparamter	Best Value
num_layers	9
num_nodes	110
activation_func	sigmoid
optimizer_class	lion
learning_rate	5.449514749224714e-05

	mean_squared_error	mean_absolute_error	r2_score
(0, ‘train’)	0.12613	0.198191	0.0879473
(0, ‘test’)	0.130564	0.20048	0.08552
(1, ‘train’)	0.127414	0.199016	0.0872061
(1, ‘test’)	0.118979	0.195752	0.0924823
(2, ‘train’)	0.126681	0.198671	0.0875633
(2, ‘test’)	0.125549	0.19663	0.0893076
(3, ‘train’)	0.127029	0.198596	0.0875516
(3, ‘test’)	0.123754	0.197797	0.0795305
(4, ‘train’)	0.126899	0.198766	0.0880649
(4, ‘test’)	0.123835	0.196832	0.0828053
(5, ‘train’)	0.12554	0.197758	0.0881043
(5, ‘test’)	0.135944	0.203179	0.0836539
(6, ‘train’)	0.126947	0.198569	0.0878757
(6, ‘test’)	0.123132	0.198147	0.0866351
(7, ‘train’)	0.126227	0.198311	0.0874304
(7, ‘test’)	0.129676	0.198832	0.0900592
(8, ‘train’)	0.126344	0.198277	0.0879102
(8, ‘test’)	0.128622	0.199914	0.0859053
(9, ‘train’)	0.126359	0.198572	0.0884995
(9, ‘test’)	0.128394	0.198563	0.0812679

	mean_squared_error	mean_absolute_error	r2_score
(0, ‘train’)	0.100324	0.166675	0.187977
(0, ‘test’)	0.130965	0.177124	0.0920074
(1, ‘train’)	0.101551	0.179577	0.196185
(1, ‘test’)	0.101345	0.179156	0.149704
(2, ‘train’)	0.100578	0.175928	0.200441
(2, ‘test’)	0.107871	0.177049	0.130703
(3, ‘train’)	0.102913	0.18821	0.192934
(3, ‘test’)	0.100765	0.191031	0.0704145
(4, ‘train’)	0.103356	0.182372	0.185835
(4, ‘test’)	0.103187	0.184251	0.092417
(5, ‘train’)	0.102092	0.182962	0.207228
(5, ‘test’)	0.0919273	0.18137	0.0514895
(6, ‘train’)	0.0991111	0.170512	0.187944
(6, ‘test’)	0.138846	0.182306	0.119488
(7, ‘train’)	0.100006	0.174586	0.187665
(7, ‘test’)	0.121794	0.185548	0.177905
(8, ‘train’)	0.103255	0.172554	0.187263
(8, ‘test’)	0.0966669	0.172099	0.143063
(9, ‘train’)	0.0995553	0.173294	0.204133
(9, ‘test’)	0.122812	0.188462	0.057875

Description of Coefficients	Value
count	159.000000
mean	0.021966
std	0.028256
min	0.000027
25%	0.004170
50%	0.009904
75%	0.030899
max	0.153812

	mean_squared_error	mean_absolute_error	r2_score
test	0.126845	0.198613	0.0857167
train	0.126557	0.198473	0.0878153

	mean_squared_error	mean_absolute_error	r2_score
test	0.111618	0.18184	0.108507
train	0.101274	0.176667	0.19376

Predicting AirBnB Review Scores: Report

Table of Contents:

Introduction

Methods

Data Exploration:

Data Cleaning

Whole Dataset Visualizations

Numeric Feature Visualizations

Non-Numeric Feature Visualizations

Preprocessing

Missingness and Imputation

Model 1

Model 2

Results

Model 1

Model 2

Discussion

Model 1

General Metric Discussion

Model on Fitting Graph

Coefficient Analysis

Improvements and Next Steps

Model 2

Hyperparamter Tuning

General Metric Description

Model on Fitting Graph

Improvements and Next Steps

Conclusion

Statement of Collaboration