Assessing the socio-demographic representativeness of mobile phone application data

Michael Sinclair, Saeed Maadi, Qunshan Zhao, Jinhyun Hong, Andrea Ghermandi, Nick Bailey

Research output: Contribution to journalArticlepeer-review

Abstract

Emerging forms of mobile phone data generated from the use of mobile phone applications have the potential to advance scientific research across a range of disciplines. However, there are risks regarding uncertainties in the socio-demographic representativeness of these data, which may introduce bias and mislead policy recommendations. This paper addresses the issue directly by developing a novel approach to assessing socio-demographic representativeness, demonstrating this with two large independent mobile phone application datasets, Huq and Tamoco, each with three years data for a large and diverse city-region (Glasgow, Scotland) home to over 1.8 million people. We advance methods for detecting home location by including high-resolution land use data in the process and test representativeness across multiple dimensions. Our findings offer greater confidence in using mobile phone app data for research and planning. Both datasets show good representativeness compared to the known population distribution. Indeed, they achieve better population coverage than the ‘gold standard’ random sample survey which is the alternative source of data on population mobility in this region. More importantly, our approach provides an improved benchmark for assessing the quality of similar data sources in the future.

Original languageEnglish
Article number102997
JournalApplied Geography
Volume158
DOIs
StatePublished - Sep 2023

Bibliographical note

Funding Information:
The MP datasets used in this analysis can be accessed for research purposes by application to the Urban Big Data Centre, an Economic and Social Research Council funded research centre and national data service based at the University of Glasgow. Datazone and higher geographic boundaries are available under Open Government license (http://spatialdata.gov.scot/). Postcode boundaries are freely available from the Scottish Postcode Directory (National Records of Scotland, n.d.) under the ‘Public Sector Geospatial Agreement’ which covers non-commercial use of the data. SIMD data is available under Open Government licence. CACI data are accessed here under a licence agreed with CACI for this particular study. Geomni's UKBuildings layer (Digital Map Data © The GeoInformation Group Limited (2022), created and maintained by Geomni, a Verisk company) is accessed under a general academic license via Digimap (https://digimap.edina.ac.uk/). SHS data are accessed through the UK Data Service under their standard End User Licence (Scottish Government & Ipsos MORI, 2021). All analysis is completed using a combination of PostgreSQL and R programming language (R Core Team, 2022). The code to process the data and estimate home location is openly available on GitHub (https://github.com/sinclairmichael/appliedgeography_representativeness.git).The work was made possible by ESRC's SDAI funding [ES/W012979/1] and ESRC's on-going support for the Urban Big Data Centre (UBDC) [ES/L011921/1 and ES/S007105/1]. For the use of SIMD data, Copyright Scottish Government, contains Ordnance Survey data © Crown copyright and database right (2022). CACI data, © 1979–2020 CACI Limited. This report shall be used solely for academic, personal and/or non-commercial purposes. UKBuildings data is Digital Map Data © The GeoInformation Group Limited (2022), created and maintained by Geomni, a Verisk company. The authors want to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript.

Funding Information:
The work was made possible by ESRC's SDAI funding [ ES/W012979/1 ] and ESRC's on-going support for the Urban Big Data Centre (UBDC) [ ES/L011921/1 and ES/S007105/1 ]. For the use of SIMD data, Copyright Scottish Government, contains Ordnance Survey data © Crown copyright and database right (2022). CACI data, © 1979–2020 CACI Limited. This report shall be used solely for academic, personal and/or non-commercial purposes. UKBuildings data is Digital Map Data © The GeoInformation Group Limited (2022), created and maintained by Geomni, a Verisk company. The authors want to thank the anonymous reviewers for their insightful comments and suggestions on an earlier version of this manuscript.

Publisher Copyright:
© 2023 The Authors

Keywords

  • Huq
  • Mobile phone data
  • Socio-demographic representativeness
  • Tamoco

ASJC Scopus subject areas

  • Forestry
  • Geography, Planning and Development
  • Environmental Science (all)
  • Tourism, Leisure and Hospitality Management

Fingerprint

Dive into the research topics of 'Assessing the socio-demographic representativeness of mobile phone application data'. Together they form a unique fingerprint.

Cite this