The aim of this bachelor thesis is to obtain data from the Czech real estate portal using web scraping and then analyze this data using exploratory analysis and selected data mining methods in order to find interesting relationships. The thesis is divided into theoretical and practical part.
The theoretical part primarily represents the area of data mining, including the description of the necessary concepts, selected methods and ways of its evaluation. In addition, popular data mining methodologies are described, in particular, attention is paid to the CRISP-DM methodology. Then the technology of web scraping, its principles, existing solutions, and also the ethical aspect are described.
The practical part begins with the introduction of the tools that were used during the analysis. Then it contains a brief introduction to the domain area of real estate. This is followed by a description of data collection from the real estate portal website, including API search and script creation in Python. The obtained dataset is further pre-processed in the Jupyter notebook environment. The resulting data is first analyzed using exploratory analysis. Then follows the analysis using classification, regression and descriptive methods of data mining. At the end of the thesis, the results of the analysis are discussed, and in conclusion the whole thesis is summarized.