Analisi delle serie temporali

From Linear Regressions to Rome’s Traffic Mysteries

I have always been passionate about linear regressions, perhaps since the days I used to play with the TI SR-51 calculator. But where true motivation really showed was with the HP-48SX, which forced you to learn reverse Polish notation. It allowed you to apply different regression methods and even gave you the predicted value for the next step. Fascinating, isn’t it? Almost as much as modern causal inference, introduced with the book Causality (2000), which brought causal graphs and do-calculus, making it possible to reason about causal relationships rather than just correlations.

But all this leads to the inverse question: why is it that the behavior of certain phenomena can be predicted within a margin of error? Without going too deep into theory, this ability of machine learning algorithms to predict the future can be put to very interesting uses, especially now that new datasets are emerging related to one of the deepest mysteries of human knowledge: “Rome’s traffic.”.

Several datasets have been published concerning public transport, accidents, accommodations, and more. A simple yet practical model would be the estimation of ridership—or simply the predicted travel time to reach a destination using public transport, starting for example from standard schedules and weather forecasts (via the Open-Meteo API), but also including events occurrence like concerts, strikes, etc. The inherited experience came from two project where we put in practice several alghoritms (mix of mathematical and based on neural networks) to predict accessibility (Accessibility can be seen as the inverse of travel time, the potential of reaching weighted opportunities, or the cumulative set of destinations reachable within a reasonable threshold) https://energent.it/progetto-laocoonte/, and https://databenc.it/project/paun/ where the goal was set to predict with more than 30 minutes in advance, the crossing of a critical threshold of temperature or U/V values.

Specifically, the above figure is a forecast for the next 48 hours (and no further, since this is also a reasonable window for weather forecasts) of Metro B ridership, based on synthetic data derived from daily average values. Many more analysis are possible with the availability of more and more open data than in the past.

Lascia un commento

Il tuo indirizzo email non sarà pubblicato. I campi obbligatori sono contrassegnati *