The underlying argument is that these new kinds of data, stemming from individuals and communities as they go about their daily lives, contain insights into their experiences that we can mine to help them in return. This idea can be traced back to a much-cited 2009 paper, which found that light emissions picked up by satellites could track GDP growth.
So it seems only logical, and very appealing, to claim that the same data and tools could be deployed tomonitor poverty, and may even be conducive to a leap-frogging of statistical systems. Although the term Big Data is absent from the report of the High-Level Panel on the post-2015 framework, it is hard not to read it between the lines of the development data revolution it sketches.
But conceptual clarity, practical guidance, ethical considerations and innovative foresight have too often been lacking, leaving an open field for sceptics who have long stressed the risks and challenges of Big Dataor insisted that the real revolution is small data (or long data). Findings that Google got flu wrong this year in the US have cast additional doubt on Internet-based data’s reliability, representativeness, and thus relevance, to inform policy decisions, while the revelations about PRISM have raised concerns over privacy to a whole new level. But recent publications and debates have shed direct light on some of the specific promise, challenges and requirements of leveraging Big Data to improve current, and perhaps develop alternative, measures of poverty and welfare.
In particular, a paper showed that cell-phone records from a major city in Latin America could help predict socioeconomic levels, poverty’s first cousins. This was done by matching CDR-inferred behavioural data and official statistics on socio-economic levels, using supervised machine learning techniques to unveil how differences in socioeconomic levels typically ‘showed’ in cell-phone data, and back. This example illustrates a key and seemingly purpose-defeating requirement for developing models and algorithms able to translate digital data into indicators of the social world: the availability of ‘ground truth’ indicators of the social world (such as survey data) used to build and validate the models.
But this does not mean that Big Data is useless, or rather superfluous, in such contexts: indeed, assuming a sufficiently high and time-resistant level of accuracy (internal validity), CDR data would then provide some sense of changes in socio-economic levels that would not get captured until the next official survey.
The problem is evidently more acute in places where no such data exist, ie precisely where alternative indicators are most needed. One avenue would be to apply ‘matching’ rules developed elsewhere to local CDRs. But the resulting ‘alternative’ indicators will be highly conjectural because the underlying algorithm may not pass the test of external validity: applying a model matching CDRs and socio-economic levels developed using CDRs and Demographic and Health Surveys (DHS) data from Côte d’Ivoire to a neighbouring country, may yield misleading values because of cross-country differences. In such a case, the question is: is any data better than no data at all?
Another recent paper studying the impact of biases in mobile-phone ownership on estimates of human mobility inferred from CDRs is also worth mentioning for two reasons. One is its key finding: that CDR-based estimates of mobility appeared to be surprisingly robust to substantial biases in phone ownership, which may turn out to be equally true for measures of welfare. The other is its research question and method: asking how accurate a picture of the social world some Big Data streams may paint, given, or in spite of, their inherent biases, drawing (again) on survey data as ’ground truth data’.
Noteworthy investment and progress are also visible in the critical strand of research (and advocacy) on privacy-preserving analysis. In particular, researchers, using CDRs for mobility analysis too, have developed an algorithm that uses an emerging technique known as ‘differential privacy’ that injects ‘noise’ into the model at points in order to reduce the likelihood of individual re-identification.
Although not directly concerned with poverty these papers are important because they point specifically to the methodological avenues and leads that need to be explored to develop privacy-preserving Big Data capacities that may, in time, help monitor poverty.
It is also crucial to note that Big Data is not only about data production (and analysis), but also about data consumption (and exchange). If we care about adequately monitoring human welfare, we should account for the consumption of free data. Think of the hours spent on social media in cyber-cafés, and increasingly on cellphones, around the world, that provide a ‘consumer surplus’ not captured in any official statistics. The caveat may not apply to the poorest of the poor, but there is no reason to consider that a problem receiving increasing attention in developed countries is irrelevant to developing countries where Internet penetration is growing much faster. In other words: Big Data do not stand apart from the quantities and phenomena to be measured but add to the measurement problem.
The related, and perhaps even more critical, point here is that the rise of data-driven activities is deemed to render GDP (and GDP per capita) less and less relevant over time as the measure of human welfare it was never intended to be. The argument that monetary poverty and GDP per capita are very crude indicators of human progress is not new, but Big Data may prove instrumental in devising true alternative measures.
A few take-away messages emerge. First, for the purposes of poverty monitoring or development more broadly, “Big Data” is not about size, but about the qualitative nature of these data trails—what some refer to as “digital breadcrumbs”. Second, Big Data is not even primarily about the data but about the carefulness of their analysis, which requires even more, not less, contextual and ethnographic grounding. Third, Big Data is also about data consumption, not just production. Lastly, much more conceptual, empirical and methodological work is needed before Big Data can be leveraged concretely and safely for poverty monitoring; but Big Data may in time fundamentally change how we measure, and perhaps even fight, poverty.