Text analysis in incident duration prediction


Francisco Câmara Pereira
Filipe Rodrigues (fmpr [at] dei.uc.pt)
Moshe Ben-Akiva


Due to the heterogeneous case-by-case nature of traffic incidents, plenty of relevant information is recorded in free flow text fields instead of constrained value fields. As a result, such text components enclose considerable richness that is invaluable for incident analysis, modeling and prediction. However, the difficulty to formally interpret such data has led to minimal consideration in previous work. In this paper, we focus on the task of incident duration prediction, more specifically on predicting clearance time, the period between incident reporting and road clearance. An accurate prediction will help traffic operators implement appropriate mitigation measures and better inform drivers about expected road blockage time. The key contribution is the introduction of topic modeling, a text analysis technique, as a tool for extracting information from incident reports in real time. We analyze a dataset of 2 years of accident cases and develop a machine learning based duration prediction model that integrates textual with non-textual features. To demonstrate the value of the approach, we compare predictions with and without text analysis using several different prediction models. Models using textual features consistently outperform the others in nearly all circumstances, presenting errors up to 28% lower than models without such information.


Transportation Research Part C: Emerging Technologies, 2013