In a recent series of articles The Economist highlights the need for vaccinations around the world by estimating that there have been around ten million excess death around globe.
It is our long-held position that excess deaths is a key measure of the impact of the pandemic.
In Bulletin 53, we spoke about some of the problems of reported COVID-19 deaths data in England and how excess deaths represent “the most reliable way of gauging the pandemic’s impact”. This was also emphasised in Bulletin 55 where we highlighted how excess deaths show the extent of underreporting of COVID-19 deaths in South Africa.
However, what about the rest of the world, where country-wide death registries may not exist, or where recording of deaths in such registries may not be complete? How is the impact of COVID-19 to be accurately assessed?
The Economist team has come up with an innovative solution to the problem. They describe it here, but at a high level it boils down to:
- For countries, and time periods where excess deaths are available collate that data.
- For all countries and time periods collect other information that are available. Some of this may be static information such as location of the country or whether it’s an island. Some of it may be indicators that change over time, such as economic and mobility indicators but also details of reported cases.
- Using data from Step 2 and machine learning techniques, fit models to try to reproduce actual excess deaths observed in Step 1.
- Then using that model predict the missing excess deaths data in countries where no data is available, but also in time periods where data may not have been released yet (some countries take months to release their death statistics).
They derive an estimate of 10m deaths worldwide with a range of between 7m and 13m. This highlights the staggering impact of the pandemic.
What are some of the potential pitfalls with these estimates? One is that the model could be a poor fit for countries and periods where data is available. It might predict too many or too few deaths, and thus poorly project data for countries where data is not available.
Ironically, overfitting could also be a problem. Providing too good a fit to countries where data is available may be problematic. Think of an overfitted model like a classmate who memorises the text and can parrot it back in an exam, but struggles when presented with a novel problem. Someone who can distil general information from learning is better able to address novel situations. The same is true for machine learning – overfitted models perform poorly when presented with new data.
The Economist team used a technique called cross-validation to limit this risk. By training the models on data with known excess deaths, and then checking the errors of those predictions on data the model wasn’t trained on they can get an idea of the performance of the model.
The results of this process are plotted below, showing how the prediction compares to observations. There seems to be some correspondence, but also errors where predicted deaths are low and observed deaths high, or vice versa. These outliers are unlikely to influence the overall results significantly.
Issues may arise where data is very sparse (see the white regions of the map below) and needs to be estimated from countries which are very different. Consider Africa for example. How accurate is the model, which may be extrapolating at least in part, from South America or Europe? Observed data from Egypt and South Africa are likely to be influential for modelling other African countries. How representative are they? This might be problematic if individual countries are not representative of a diverse continent of 1.3bn people. This uncertainty is not easy to get around, as it is very difficult to get past the fact that in large parts of the world there is little or no death data. Care may be required when quoting figures for individual countries.
We are grateful to the authors for attempting such a difficult task and most likely doing a good job, given the limitations of the data.
This work highlights the need for global co-operation to address the pandemic, and that many people may be dying in countries without proper reporting infrastructure.