
Agatha Mattos
Student
Project Title
Mapping Slums with Machine Learning
Project Description
It is estimated that over one billion people live in slums without access to public services such as water and sanitation. Knowing the location and extent of these settlements is critical to monitor the United Nations’ Sustainable Development Goal of ensuring adequate housing and basic services for all by 2030. However, detecting the location of slum areas at a global scale is an open problem. Currently, the main source of information about the percentage of the world urban population living in the slums comes from housing census surveys, which are labour-intensive, time consuming and require substantial financial resources. An alternative is to use passively collected data, such as satellite imagery, and image processing techniques for the task of mapping these settlements. In this context, recent reviews show that the use of machine learning image processing techniques is still very scarce, but that it shows promising results.
The proposed research seeks to advance the development of a global slum inventory. The first objective is to develop a new algorithm for mapping slums. To achieve this, a georeferenced dataset of slum communities, made available by the Brazilian government, will be used. The characteristics of this dataset make it well- suited for this research. Firstly, the slum areas are already georeferenced and labelled, a process that is often very time-consuming in machine learning projects. Secondly, it has data for many cities, which allows for models to be trained in different urban contexts. This has been demonstrated as a limitation of current studies and that would, consequently, contribute to the advancement of the state-of-the-art on the topic. The second objective is utilise machine learning techniques to quantify the population living in these settlements. To assess the suitability of the algorithms that will be developed in this research, the results will be compared with official census data. The models will also be evaluated in terms of the cost of acquiring the images, computation complexity and generalizability of the approach to other regions.