Big data, migration and human mobility
The term "big data" includes anonymized data that are generated by users of mobile devices, internet-based platforms, or by digital sensors and meters, for example, satellite imagery. With about 5.1 billion unique mobile users, and around 4 billion active internet users around the world (We Are Social - Hootsuite, 2018), such "digital traces" present an enormous opportunity to complement traditional sources of migration data and improve knowledge of various aspects of migration. This is all the more relevant in light of the current data gaps and the need to monitor progress towards the migration-related targets in the Sustainable Development Goals (SDGs). The potential of these innovative sources, however, comes with significant challenges.
Back to top
Big data are usually understood as data generated automatically by users of mobile phones, social media, internet platforms and applications, as well as via digital sensors and meters. Such data are stored in real time in large databases, usually owned by private companies - be it mobile phone operators, providers of social media platforms or other internet-based services. However, big data are not only "big" because of their volume; the speed ("velocity") at which they are generated and the complexity ("variety") of the information are also considered as distinguishing features of this kind of data (Hilbert, 2013).
Big data are different from data based on traditional household surveys as they do not refer to a random sample of individuals but to the totality of the population using, for instance, mobile phones or internet-based platforms, and these data are accessible in real time (Hilbert, 2014). Big data also differ from traditional data because of the specific technical and analytical methods required to extract meaningful insights from them and transform these data into ”value” (de Mauro, Greco and Grimaldi, 2016). Letouzé (2015) distinguishes between big data as data, or "digital" translation of human actions, interactions and transactions picked up by digital devices and services," and big data as "an ecosystem of data, human and technical capacities and communities" producing and using such information for decision-making.
A relatively small but fast-growing body of literature and a number of applications have demonstrated the potential of using various types of big data sources - usually mobile-phone or internet-based sources - or the combination of traditional and new sources, to improve understanding of mobility and migration processes (Global Migration Group, 2017). Noteworthy examples include the following:
- Mobile phone Call Detail Records (CDR) have been used to track internal displacement following natural disasters, such as the Haiti and Nepal earthquakes, or estimate money-transfer patterns in post-disaster situations (Bengtsson et al., 2011; Blumenstock, Eagle and Fafchamps, 2013). Such records contain information on the approximate location of the calling and receiving end, time and duration of the call, as well as the calling and receiving number, which are anonymous identifiers of the caller and the receiver. While CDR data are usually more helpful to identify internal migration patterns, they could also be used to measure international migration at the sub-regional level, particularly when combined with other sources. For instance, the combination of CDR with satellite data can help to map movements between cross-border communities (Sorichetta, 2017). International mobile phone calls coupled with census statistics can also contribute to understanding patterns of migrant integration and residential segregation (Natale, 2017). The SoBigData Consortium, led by the University of Pisa, is also analyzing mobile phone data from the Turketlekom D4R challenge to assess the integration of Syrian refugees in Turkey, and building a "refugee integration index" through the combination of these data with geo-referenced Twitter data and official labour force statistics. CDR can also be used to identify people living and working in more than one country, or "transnationals" (Ahas and Tiru, 2017).
- Geo-located social media activity, such as on Twitter and LinkedIn, have been used to infer international migration flows and stocks, also disaggregated by age, sex as well as skill levels or sector of occupation, based on user self-reported information (State et al., 2014; Zagheni, Kiran and State, 2014). The number of active social media users globally in January 2018 reached 3.2 billion (We Are Social and Hootsuite, 2017), of which 2.1 million were Facebook users alone (Statista, 2018). The popularity of these platforms, together with the geotagged information that can be extracted from them, can be leveraged to study mobility patterns.
- Social media data can also be obtained through marketing platforms offered by companies to advertisers willing to target specific audiences. Data from the Facebook advertising platform can yield information on a number of characteristics of Facebook users, such as their (self-reported) age, sex, their "home country" and country of current residence, their educational background, sector of occupation and personal interests. This means that Facebook can almost be used as a "real-time census" to estimate, among other things, the number of users classified by the social media platform as "expats" (users living in a country other than their 'home country') at the national or global level at a certain point in time (Zagheni, Weber and Gummadi, 2017). In a recent European Commission Joint Research Centre technical report (Spyratos et al., 2018), the authors estimated the number of 'expatriates' in 17 European Union (EU) countries based on the number of Facebook users classified as 'expats,' and were able to identify the increase in the number of Venezuelan migrants in Spain in early 2018, confirmed in official statistics from the Spanish National Statistical Office. One of the issues with social media data is that certain segments of the population may be over- or under-represented (for instance, young people are more likely to use Facebook than older people). The report proposes a methodology for correcting this bias, based on penetration rates by age group and sex in the country of previous residence and country of destination of a Facebook expat. Social media content can also be used to analyse public sentiment on migration, and how opinions on social media can become polarized (Natale, 2017).
- Repeated logins to the same website and IP addresses from e-mail sending activity have been used to estimate international mobility patterns and users' likelihood to move to another country (Zagheni and Weber, 2012). Self-reported information on sex and age of users also allowed to estimate migration rates by sex and age group. Online search data may also be helpful to forecast forced migration under certain circumstances, as shown in a paper comparing Google Trends data with numbers of arrivals of asylum-seekers in Europe (Connor, 2017). Similarly, the Google Trends Index (GRI) for migration-related search terms can be exploited to measure migration intentions from a certain country and predict subsequent emigration flows (Böhme, Gröger and Stöhr, 2018). Since the Google search engine is estimated to count more than a billion users worldwide, the GTI data can be highly representative of the global population and can therefore be used as a tool to predict future migratory movements (Ibid.). The European Asylum Support Office and the University of Milan are using a combination of Google Trends data and traditional data sources to detect changes in country of origin contexts and forecast asylum applications in the EU (forthcoming).
Big data sources that have so far been used in migration-related studies can be grouped under three broad categories (Global Migration Group, 2017):
- Mobile-phone-based - e.g. call records or mobile money transfers.
- Internet-based - e.g. social media or use of search engines.
- Sensor-based - e.g. Earth Observation Data (satellite imagery).
The infographic below shows the specific types of sources.
Data strengths & limitations
The advantages of using new data sources for the analysis of migration-related aspects are linked to their potential to fill some of the gaps in traditional data sources and methods. While acknowledging the progress made by national governments and the international community on migration statistics, traditional data sources have inherent limitations: national population censuses are costly and infrequent, migrants may be hard to sample in household surveys, and they may be undercounted in administrative records if they are not able to access services in the host country. The increased availability of digital records presents an opportunity to address some of knowledge gaps around migration and mobility, especially given their timeliness, the frequency at which information can be updated, their wide coverage (of all users of mobile devices and internet-based platforms), and the level of detail they can provide.
Big data may be particularly useful to study patterns of temporary or circular migration, which are hard to measure through traditional sources and methods, or to anticipate migration trends. They can also contribute to more timely monitoring of public opinion or media discourse on migration, compared to public opinion surveys, for instance. Another advantage is that such data are generated at no additional cost and can be obtained at a lower cost compared to data from traditional sources - depending on the willingness of data holders to share data or the insights these can generate. The combination of information that can be extracted from traditional and innovative data sources can provide evidence on aspects of migration we currently have limited knowledge of, such as integration prospects of recently-arrived migrants in a country, fluid forms of migration that fall outside the UN definition of temporary or permanent migrants, or future migration movements.
The opportunities offered by big data are met by some significant challenges:
Ethical and privacy issues: There are confidentiality and ethical issues in using data automatically generated by individuals, often without their informed consent, as well as civil liberties concerns due to the risks of using such data for surveillance purposes, which are particularly serious in contexts of irregular migration and forced displacement. The creation of adequate legislative and regulatory frameworks to safeguard confidentiality of the information and ensure the ethical use of data is necessary. To this end, the EU's Agency for Fundamental Rights (FRA) is currently working on a project titled "Artificial Intelligence, Big Data and Fundamental Rights" which assesses the advantages and disadvantages in terms of fundamental human rights of using artificial intelligence, machine learning and big data for public policy and business purposes. The project aims to produce fundamental rights guidelines and recommendations in using artificial intelligence for policy. IOM was one of the first international organizations to adopt its own Data Protection Principles, and is affiliated to the International Data Responsibility Group (IDRG), a global network of experts and organizations working on principles and standards required for guiding the data revolution in the context of humanitarian action and sustainable development. IOM also supported the Signal Program on Human Security and Technology of the Harvard Humanitarian Initiative, which produced core ethical obligations for information activities in humanitarian contexts.
Big data are inherently biased: As mentioned above, users of social media or mobile phones are not necessarily representative of the population at large. Specifically, differences in internet access or use of mobile devices and social media platforms by level of economic development, sex, age and urban/rural areas are still significant. Researchers are working to address the methodological challenges associated with such ("self-selection") bias and results look promising (Spyratos et al., 2018; Zagheni, Weber, and Gummadi, 2017; Hughes et al., 2016). Understanding the measurement error inherent in big data sources is helpful to increase the predictive capacity of models based on such sources, and facilitate sensible use of big data for decision-making.
Technical, analytical and legal challenges: Some of the challenges are due to difficulties in accessing data - held by private or state actors - or using data for research purposes; inappropriate infrastructure and data management and security systems; and methodological difficulties in extracting meaning from huge, complex and "noisy" volumes of data. There are also issues of continuity of data, considering the rapid pace of technological change and innovation, and difficulties in gaining an overall picture of which big data sources or innovative methods can yield useful insights for policy, due to the proliferation of pilot applications and the absence of systematic services in this area. In this sense, the development of innovative "public-private partnerships" for data exchange and collaborations, such as "Data Collaboratives" (Verhulst, 2015) should be incentivized to make progress in this area.
As a way to concretely explore how to harness new data sources for migration analysis and policymaking, on 25 June 2018, IOM's Global Migration Data Analysis Centre (GMDAC) and the European Commission Knowledge Centre on Migration and Demography (KCMD) launched the Big Data for Migration Alliance (BD4M). While a series of initiatives exist at the UN- and EU-level focused on data innovation for sustainable development, such as the UN Global Pulse, the UN Data Innovation Lab, and the UN Global Working Group (GWG) on Big Data for Official Statistics, there was still no unit specifically tasked with investigating the potential of new data sources in the field of migration and human mobility - hence the idea to create a dedicated Alliance.
The BD4M aims to address the challenges of data innovation for migration by a) facilitating the creation of new forms of partnership between the private and the public sectors; b) demonstrating the potential of new data sources to respond to specific policy needs, particularly when data from traditional sources may not be available; and c) establishing a dialogue between policymakers, scientists, data providers and regulators to actively address trust, privacy and ethical issues. Plans to create the Alliance were announced in the follow-up to the expert workshop Big Data and alternative data sources on migration: From case-studies to policy support, jointly organized by the KCMD and GMDAC in Ispra on 30 November 2017. More information about the BD4M is available here.
Bengtsson, L. et al.
Blumenstock, J., Eagle, N. and Fafchamps, M.
Böhme, J., Gröger, A., and Stöhr, T.
Campo, S. et al.
European Union Agency for Fundamental Rights (EU FRA)
Global Migration Group
Data for Development (Chapter 1c Innovative data sources (mobile
phones, social media)).
Hughes, C. et al.
Independent Expert Advisory Group on a Data Revolution for Sustainable Development
Laczko, F. and Rango
State, B. et al.
Spyratos, S. et al.
State, B. et al.
United Nations Global Pulse
Zagheni, E., Kiran, V.R. and State, B.
Zagheni, E. and Weber, I.
Zagheni, E., Weber, I., and Gummadi, K.