In order to overcome these limitations, data practitioners have increasingly shifted their attention to new data sources. Cellular networks, in particular, implicitly bring an incredibly large ensemble of details on human activity. The high penetration of mobile devices makes it possible to rapidly scale up the study of large-scale behaviours at Nation-level to millions of customers. For mobility studies, the key concept is to opportunistically consider mobile terminals as moving and ubiquitous sensors and rely on the traces they leave in the operators’ logs to reconstruct their original trajectories. The enormous amount of cellular data is not a show stopper as it was in the past: modern Big Data technologies make it now possible to easily deploy data analytics platforms to deal with extremely high data rates, also in real time.
One of the most popular and extensively studied cellular data type in the scientific literature consists in the Call Detail Records (CDR). CDRs are tickets summarizing mobile subscribers’ activity and are used by telecom operators to support their billing procedures. CDRs are very popular due to their rather large availability (they do not require dedicated monitoring platforms and can be considered a ready-made data source). However, they have been criticized in the past. Some data practitioners highlighted the scarcity of information provided by CDRs: the position of mobile terminals in the cellular network topology is logged when some kind of activity (calls, SMS, data connections) occurs, which translates in a picture of mobility somehow biased by the activity degree of users. In other words, we cannot track users’ trajectories with fine temporal granularity as they are inactive (and hence “invisible”, from the CDR perspective) most of the time.
Our claim is that this situation has drastically changed. The usage patterns of mobile devices such as smartphones have been evolving at a rapid pace in the last years. Indeed, modern typical habits call for continuous interactions with our devices (we check our screens quite often, don’t we?), and background synch activity of many common applications makes smartphones “always connected”. The questions are: how this shift has impacted the quality of CDRs? Does the increasing popularity of flat data plans result in a higher and steadier number of reported logs?
At the Big Data & Data Science Unit of Eurecat, in collaboration with Orange Spain, we have answered these questions by observing anonymous CDR datasets over a period of two years. The results were astonishing: not only the total number of CDR logs per user have steeply increased, but also their uniformity and distribution over time is much higher. This suggests that by observing CDRs, we can now track users’ positions with higher temporal granularity, greatly reducing the bias in inferring the actual users’ movements from their cellular footprint. For the first time, we are now able to reconstruct trajectories with high accuracy allowing the study of mobility in challenging scenarios, such as in urban contexts.
The results of our study are described in the paper Call Detail Records for Human Mobility Studies: Taking Stock of the Situation in the “Always Connected Era”, presented at the ACM SIGCOMM 2017 Workshop on Big Data Analytics and Machine Learning for Data Communication Networks (Big-DAMA 2017). The outlook for future research on human mobility is exciting and we expect that more and more research works, as well as commercial products, will be targeting CDRs as a reliable mobility information source in the future.