Skip to content

Building a dataset from scratch and experimenting some ideas with regression models

Notifications You must be signed in to change notification settings

YasserElsedawy/regression-modeling

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

14 Commits
 
 
 
 
 
 

Repository files navigation

What Makes Cities Attractive for People's Moving, Visiting, or Relocating? A Machine Learning Model.

Can we predict the number of people visiting a certain city by using Machine Learning Techniques? can we predict the reasons behind that decision? According to the world economic forum, the rise in the number of megacities is the most visible evidence of the accelerating global trend towards urbanization. In 1950, cities were home to 751 million people, less than one-third of the global population. Just two (New York and Tokyo) had more than 10 million inhabitants. Today, 55% of us live in urban areas – that’s 4.2 billion people. In another generation, that proportion is set to grow to 68%, potentially adding another 2.5 billion people to already crowded cities.

But what is it the reason that people seek when moving, visiting, or relocating to another city? Is it a good economy? is it the culture? maybe the social rights, or the crime rates, real estate affordability, LGBT+ rights or even average net salary? The fact is that there are a lot and a lot of factors that people can take into consideration when moving to another city for tourism, business or even to study.

Through this Article I will try to produce a prediction model using Python to predict and forecast cities demand and finding the causal effect relationship between different variables. I will be building a dataset from scratch and experimenting some ideas with regression models. This dataset includes official data collected from different sources such as Eurostat, Istat, Statista, World Bank, Government platforms, Teleport, OECD, and Numbeo. The data will be exploring a sample of 42 cities ranked by 13 selected variables.

The data will be tested for correlation to measure if those variables have a relationship with the number of people visiting a certain city each year, and to also measure if we can predict the numbers of people visiting a city throughout those variables. We will be measuring how those variables are in relationship with each other. A regression analysis will be implemented, both linear simple regression and multiple linear regression, a prediction model will be produced based on the data and evaluated with different methodologies.

The variables we selected to compare against the Visitors yearly numbers are as following: Visitors to locals ratio, Employment, GDP, GDP per capita, Population, Foreign Population, Land area (km2), Medium Size City Center Apartment Rent, AVG Net Salary, Air Quality score, Urban Greenery score, Life Expectancy and number of Startups.

The selected cities are all from within the European union, they were chosen due to similar regulation, similar polices and easy travel between them. The selected cities are: Paris, London, Milano, Madrid, Munich, Berlin, Barcelona, Stockholm, Hamburg, Frankfurt, Roma, Amsterdam, Stuttgart, Brussel, Vienna, Dublin, Napoli, Budapest, Lisbon, Oslo, Lyon, Copenhagen, Leeds, Rotterdam, Zürich, Helsinki, Manchester, Prague, Antwerp, Porto, Utrecht, Bordeaux, Bilbao, Eindhoven, Basel, Liverpool, Genève, Nice, Firenze, Riga, Vilnius, Tallinn.

Let's get started.

About

Building a dataset from scratch and experimenting some ideas with regression models

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published