Estimating disaggregated employment size from Points-of-Interest and census data: From mining the web to model implementation and visualization


Filipe Rodrigues (fmpr [at]
Ana O. Alves
Evgheni Polisciuc
Shan Jiang
Joseph Ferreira
Francisco Câmara Pereira


The global spread of internet access and the ubiquity of internet capable devices has lead to an increased online presence on the behalf of companies and businesses, namely in collaborative platforms called local directories, where Points-of-Interest (POIs) are usually classified with a set of categories and tags. Such information can be extremely useful, especially if aggregated under a common (shared) taxonomy. This article proposes a complete framework for the urban planning task of disaggregated employment size estimation based on collaborative online POI data, collected using web mining techniques. In order to make the analysis possible, we present a machine learning approach to automatically classify POIs to a common taxonomy – the North American Industry Classification System. This hierarchical taxonomy is applied in many areas, particularly in urban planning, since it allows for a proper analysis of the data at different levels of detail, depending on the practical application at hand. The classified POIs are then used to estimate disaggregated employment size, at a finer level than previously possible, using a maximum likelihood estimator. We empirically show that the automatically-classified online POIs are competitive with proprietary gold-standard POI data. This fact is then supported through a set of new visualizations that allow us to understand the spatial distribution of the classification error and its relation with employment size error.


Machine learning, Spatial analysis, Points-of-interest, Urban planning, GIS


International Journal on Advanced Intelligent Systems, vol. 7, 2013