当前位置:网站首页>Using tensorflow to forecast the rental price of airbnb in New York City
Using tensorflow to forecast the rental price of airbnb in New York City
2020-11-06 01:14:23 【Artificial intelligence meets pioneer】
author |TIMOTHY102 compile |VK source |Analytics Vidhya
Introduce
Airbnb It's an online market , Allow people to rent their property or spare rooms to guests . Per booking 3 Guests , collect 12% and 6% Commission .
The company started from 2009 Since its establishment in , Has been helped from every year 2.1 Ten thousand guests found accommodation , To help every year 600 Ten thousand people on holiday , Currently in 90 From different countries 34000 Cities list amazing 80 Ten thousand houses .
In this paper , I will use Kaggle-newyorkcityairbnb Open data set , Try to use TensorFlow Build a neural network model to predict .
The goal is to build a suitable machine learning model , Be able to predict the price of future accommodation data .
In this paper , I'm going to show you what I've created Jupyter Notebook. You can GitHub Find it on the :https://github.com/Timothy102/Tensorflow-for-Airbnb-Prices
Load data
First , Let's see how to load data . We use it wget Directly from Kaggle Get data on the website . Be careful -o The flag indicates the file name .
The dataset should look like this . share 48895 That's ok 16 Column .
Data analysis and preprocessing
Seaborn There's a very simple API, You can draw all kinds of graphs for all kinds of data . If you are not familiar with grammar , Check out this article :https://www.analyticsvidhya.com/blog/2019/09/comprehensive-data-visualization-guide-seaborn-python/
stay pandas Use on data frame corr after , We pass it on to a heatmap function . give the result as follows :
Since we have longitude and longitude and neighborhood data , Let's create a scatter plot :
Besides , I've deleted duplicate items and some unnecessary Columns , And filled in “reviews_per_month”, Because it has too many missing values . The data looks like this . It has 10 Column , There is no zero value :
very good , Right ?
First , Computers do numbers . That's why we need to convert a sort column into a one-hot Encoded vector . This is the use of pandas Of factorize Method . You can use a lot of other tools :
In order to keep the loss function in a stable range , Let's normalize some data , Let the average be 0, The standard deviation is 1.
Feature crossover
We have to make a change , This is an essential change . To correlate longitude and latitude with model output , We have to create a feature crossover . The following links should provide you with sufficient background knowledge , So that you can feel the cross of features correctly :
- https://developers.google.com/machine-learning/crash-course/feature-crosses/video-lecture
- https://www.kaggle.com/vikramtiwari/feature-crosses-tensorflow-mlcc
Our goal is to introduce latitude longitude crossing , This is one of the oldest techniques in the book . If we just put these two columns in the model as values , It will assume that these values are gradually related to the output .
contrary , We're going to use feature crossover , That means we're going to put longitude * The longitude map is divided into a grid . Fortunately, ,TensorFlow Make it easy .
I go through iteration (max-min)/100, So as to generate a frame grid with uniform distribution .
I use it 100×100 grid :
Essentially , What we're doing here , Is to define a bucked Columns and the boundaries defined earlier , And create a DenseFeatures layer , Then pass it to Sequential API.
If you're not familiar with it Tensorflow grammar , Please check the documentation :https://www.tensorflow.org/api_docs/python/tf/feature_column/
Now? , finally , We are ready for model training . Apart from splitting the data part , in other words .
obviously , We have to create two datasets , One contains all the data , The other contains the predicted score . Due to data size mismatch , This may cause problems for our model , So I decided to truncate data that was too long .
Creating models
Last , Established Keras Sequence model .
We use Adam Optimizer 、 Mean square error loss and two metrics to compile the model .
Besides , We use two callbacks :
-
Stop early , This is self-evident
-
Reduce the learning rate at high altitude .
after 50 individual epoch Training for ,batch The size is 64, Our model is quite successful .
ending
We use New York City AirBnB The data builds a fully connected neural network to predict future prices .Pandas and seaborn It makes it very easy to visualize and examine data . We introduce the idea of latitude longitude crossing as a feature in the model . And thanks to that Kaggle Open data set of , We have a fully operational machine learning model .
Link to the original text :https://www.analyticsvidhya.com/blog/2020/10/predicting-nyc-airbnb-rental-prices-tensorflow/
Welcome to join us AI Blog station : http://panchuang.net/
sklearn Machine learning Chinese official documents : http://sklearn123.com/
Welcome to pay attention to pan Chuang blog resource summary station : http://docs.panchuang.net/
版权声明
本文为[Artificial intelligence meets pioneer]所创,转载请带上原文链接,感谢
边栏推荐
- C++ 数字、string和char*的转换
- C++学习——centos7上部署C++开发环境
- C++学习——一步步学会写Makefile
- C++学习——临时对象的产生与优化
- C++学习——对象的引用的用法
- C++编程经验(6):使用C++风格的类型转换
- Won the CKA + CKS certificate with the highest gold content in kubernetes in 31 days!
- C + + number, string and char * conversion
- C + + Learning -- capacity() and resize() in C + +
- C + + Learning -- about code performance optimization
猜你喜欢
-
C + + programming experience (6): using C + + style type conversion
-
Latest party and government work report ppt - Park ppt
-
在线身份证号码提取生日工具
-
Online ID number extraction birthday tool
-
️野指针?悬空指针?️ 一文带你搞懂!
-
Field pointer? Dangling pointer? This article will help you understand!
-
HCNA Routing&Switching之GVRP
-
GVRP of hcna Routing & Switching
-
Seq2Seq实现闲聊机器人
-
【闲聊机器人】seq2seq模型的原理
随机推荐
- LeetCode 91. 解码方法
- Seq2seq implements chat robot
- [chat robot] principle of seq2seq model
- Leetcode 91. Decoding method
- HCNA Routing&Switching之GVRP
- GVRP of hcna Routing & Switching
- HDU7016 Random Walk 2
- [Code+#1]Yazid 的新生舞会
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- HDU7016 Random Walk 2
- [code + 1] Yazid's freshman ball
- CF1548C The Three Little Pigs
- HDU7033 Typing Contest
- Qt Creator 自动补齐变慢的解决
- HALCON 20.11:如何处理标定助手品质问题
- HALCON 20.11:标定助手使用注意事项
- Solution of QT creator's automatic replenishment slowing down
- Halcon 20.11: how to deal with the quality problem of calibration assistant
- Halcon 20.11: precautions for use of calibration assistant
- “十大科学技术问题”揭晓!|青年科学家50²论坛
- "Top ten scientific and technological issues" announced| Young scientists 50 ² forum
- 求反转链表
- Reverse linked list
- js的数据类型
- JS data type
- 记一次文件读写遇到的bug
- Remember the bug encountered in reading and writing a file
- 单例模式
- Singleton mode
- 在这个 N 多编程语言争霸的世界,C++ 究竟还有没有未来?
- In this world of N programming languages, is there a future for C + +?
- es6模板字符
- js Promise
- js 数组方法 回顾
- ES6 template characters
- js Promise
- JS array method review
- 【Golang】️走进 Go 语言️ 第一课 Hello World
- [golang] go into go language lesson 1 Hello World