Predictive models for COVID-19-related deaths and infections

To ensure that scientific research of immediate concern is shared as rapidly as possible, The Union is fast-tracking accepted articles from the International Journal of Tuberculosis and Lung Disease (IJTLD) and publishing them in full on our website, prior to their publication in the Journal. Find out more about submitting to the IJTLD.

This article was submitted on 30 March 2020 and accepted for publication on 7 April. Read the full text for more information, including the acknowledgments and citations.

A.G. Gerli, S. Centanni, M. Miozzo, G. Sotgiu

Dear Editor,

The key role that robust models can play in predicting the incidence of infections and deaths has been highlighted since the beginning of the COVID-19 pandemic. Appropriate and early estimates of the impact of COVID-19 can help each country to implement preventive infection control measures. Furthermore, these models can enable policymakers plan the national and regional healthcare systems required (such as building intensive care units [ICUs]) and design tailored action plans.

An exponential model to predict the burden of infected patients was recently developed for Italy. This computational approach proved reliable for the first 15 days of the epidemic but resulted in a significant overestimation of the level of infection after this period. We found several pre- and post-containment R0 estimates (R0 being the average number of people who will catch the disease from a single infected person) for the first COVID-19 outbreak in the literature. Pedersen and Meneghini predicted pre- and post-containment R0 values of respectively 2.59 and 1.9, at a reduction rate of 27%.

We attempted to fit this model to the Italian data, assuming peak infection day to be 21 March, when the estimated R was 1.27; we then applied a 27% reduction to get the new R value after containment (0.93). Next, we calculated R values using two other models, the first assuming the peak day to be 1 April and another on 15 April. Our values were respectively 1.22 and 0.76, and 1.18 and 0.81, with a reduction rate of 38% and 31%. The exponential model was assessed using the incidence of Italian and Chinese infections (Figure A in full text). Although the first part of the curve in this model fits the trend in confirmed cases in Italy, the model leads to an overestimation of the incidence after 17 days (R2 = 0.9991, Akaike’s Information Criterion [AIC] = 6, before the overestimation). When used for cases in China, the exponential model resulted in a similar problem (i.e., overestimation) after 17 days, although model reliability was acceptable (R2 = 0.9924, AIC = 6).

We then adopted a more reliable and robust data mining approach: the model was fitted with data retrieved from the Italian and Chinese epidemics, and several curves (e.g., exponential curves, third-degree polynomial curves, 5-parameter logistic [5PL] asymmetrical sigmoidal and Gaussian), adapted to the public data sets, were explored. In particular, the Chinese data were divided into two sections—before and after the peak infection day (14 February 2020; 6,464 cases) to better understand the curve trend. Although 15,136 new cases were recorded on 13 February 2020, only 37 had been notified the day before (this was due to the more comprehensive testing approach and delays in recording and reporting).

Data on infection growth up until the peak infection day can be fitted into a third-degree polynomial curve; thereafter, it fits into a 5PL asymmetrical sigmoidal curve following parametric growth (80% of aggregated cases at peak day, assumed to be at 20% of estimated aggregated outbreak duration; 90% of total cases at 24.4% of expected duration; 94% of total cases at 30% of expected duration; 97.5% at 40% of expected duration). The 80/20 per cent rule is applicable to this model, i.e., 80% of aggregated cases will occur in 20% of total outbreak time. Data on Chinese deaths could also be fitted to a mixed third-degree polynomial (based on a starting number of 10 aggregated deaths) up to the peak day, and then to a 5PL asymmetrical sigmoidal curve following parametric growth (45% of aggregated cases at peak day, which is at 20% of estimated aggregated outbreak duration; 50% of total cases at 20,8% of expected duration; 63,3% of total cases at 24% of expected duration; 81,4% at 30% of expected duration; 92,8% at 40% of expected duration; 99,5% at 80% of expected duration). Both curves are shown in Figure B (see full text). Using this model, it is possible to predict the trend in infection with accuracy from 17 days after the outbreak has begun. The correlation between expected and occurred deaths is high.

Our case scenario describing the Italian epidemic started on 22 February using third-degree polynomial curve was extrapolated to the first 17 days. Up to 20 March, only a difference of 2.7% between confirmed and expected cases and a difference of –1.3% between confirmed and expected deaths were found (Figure C in full text). We then plotted the two curves of expected cases and deaths according to two different scenarios: peak day on 1 April and on 15 April (Figure D in full text).

We conclude that the proposed predictive model, based on the biological assumption that “herd immunity” can reduce contagiousness in the population after an exponential increase, is valid, and results in improved strategic decision-making by limiting the spread of the SARS-CoV-2 virus, as well as by reducing the mortality rate. Nevertheless, the present health emergency can be adequately resolved only if the healthcare system is ready to adapt to the increased number of cases (mainly patients with severe disease).

An inappropriate estimation and prediction based on an inadequate forecasting model can affect the resilience of the national healthcare system and have undesirable consequences. We have established a new model for predicting COVID-19-related deaths and infections that can be easily applied worldwide for better strategic choices in any context. The model can be used to estimate the healthcare burden to implement and scale-up all the most relevant logistic, economic and financial interventions.