As part of a research I am doing at the college, I need to go into the various hotel sites, extract data from them and analyze them. This project is going to be updated regularly during the coming semesters until the end of the research.
The general idea of the study is to estimate the prices of hotels according to different regions and hotel names.
The main advantage of the success of predicting hotel prices is the ability to lower costs to the various search sites that offer hotel search services.
This is because those websites that offer hotel search services use the original hotel's database to provide service to those customers who are looking for hotels, and the cost of such search in the hotel's database costs money.
On the other hand, if those sites had the ability to predict a high percentage of success in hotel prices, they would not have to make immediate use of the databases of those hotels, but rather use forecasting, a kind of "smart database".
And only when the customer is certain that he's interest in purchasing a room reservation, the hotel search services would perform a secondary search, this time in the hotel's database, to get the exact price / availability of the hotel's room.
The idea is not to always use the database of hotel sites, but only if the user really wants to buy, thus saving significant costs of money on the service and use of the same databases provided by hotel sites.
Note: Started running the scrape since June.
So there are two main files: expedia.py and booking.py that basically just extract the hotels data from these websites, and saves them in a CSV file that later on will be used for analysis.
An example for data that could be extracted from these website:
| Check-In Date | Check-Out Date | Hotel Name | Price($) | Rating | Number of Reviews |
|---|---|---|---|---|---|
| 07-05-2019 | 08-05-2019 | Hotel Pennsylvania | 200 | 5.6/10 | 40,753 |
| 07-05-2019 | 08-05-2019 | The Mansfield Hotel | 215 | 7.6/10 | 765 |
| 07-05-2019 | 08-05-2019 | Days Hotel by Wyndham on Broadway NYC | 194 | 6.6/10 | 4,984 |
| 07-05-2019 | 08-05-2019 | Holiday Inn Manhattan Financial District | 499 | 8.2/10 | 2,441 |
| 07-05-2019 | 08-05-2019 | Millennium Hilton New York One UN Plaza | 459 | 8.4/10 | 3,957 |
| 07-05-2019 | 08-05-2019 | Hyatt Place New York/Midtown-South | 359 | 8.6/10 | 3,639 |
In order to fulfill this research needs, I had to learn Python independently, since it is a language that supports functional programming, friendly, and is suitable for the purposes of research.
These instructions will get you a copy of the project up and running on your local machine for development and testing purposes. See deployment for notes on how to deploy the project on a live system.
What things you need to install the software and how to install them:
imports / technologies:
- pip-19.0.3
- selenium
- datetime
- csv
- re
- urlib3
- setuptools
It can be done for example in PyCharm ( as I use this IDE for python in this repository ):
File -> Settings -> Project: *Your Project Name* -> Project Interpreter
Then search for: pip, selenium, setuptools, urlib3.
- PyCharm - The Python IDE
- Daniel Paz - Part of the research program - Profile
- Aviv Ezer - Part of the research program
- Hadar Shemesh - Part of the research program