So, you’re trying to figure out what the hell is going on with all the various differing reported death rates for Covid19 and have no idea what to make of any of it? Welp, let’s see if we can shine some light on the whole shebang for you. First of, take this with a giant truckload of salt.

All of this conflicting info can basically end up with you getting more and more frustrated, shooting your anxiety through the roof and becoming fearful of a invisible monster that you don’t fully understand. Being able to understand something that we’re scared of helps shift our subconscious from the “flight” to the “fight” reaction and it helps us face our fears head-on.

This ambiguity can lead to frustration and unhealthy amounts of fear. 


Long lines, mandates to wear masks, shelter in place orders, and social distancing – you’ve seen a ton of this… however when you take a look at the number of cases being reported in your area, you might think to yourself “wtf is the big deal with all of this, doesn’t seem too bad”….

And this is exactly why it’s important to have solid data at hand in order to judge the risk level propretly.

Spoiler alert: Yes, shit is underreported. If you don’t want to bothering reading all this shit, you can play with my app by clicking here:

The End of Normalcy in 2020 and likely 2021 as well.

So, why don’t we have the REAL numbers being reported?

One reason is that the real numbers simply can’t be measured. 

Screen Shot 2020-04-17 at 12.59.17 PM.png

But why?

Really, the primary reason for it is the prevalent asymptomaticity of Covid-19. Recent estimates of the fraction of asymptomatic carriers is reported to be 25% by the CDC. Asymptomatic carriers will not notice they’re infected and therefore won’t contribute to the case numbers. Moreover, a huge chunk of covid cases are mild, to the degree that ~85% of cases were mild enough not require a visit to the ER resulting in these cases being undocumented, according to a study published in Science. At face value, this is pretty disturbing. This primary source of underreporting will mostly affect the reports of cases, and of estimates of Covid19 dynamics such as the R0, but it will not affect as much the reported fatality numbers, since these will stem from the most severe cases that will end up in the ER.

Let’s take a look at the death underreporting for a sec. One thing provided by WSJ, is a story of several large death tolls that have happened in Italian LTC homes, a third of their inhabitants died in March and none were attributed to Covid19. Other issues would arise from deaths at home, as well as deaths of patients who haven’t been tested for Covid because of test shortages.

So, how do we figure out what the real numbers are?

The Magic Sauce

Back in march, I stumbled upon a medRxiv preprint that was of interest to me on LinkedIn from Lachmann et al that focused on correcting the under-reported Covid-19 case numbers.

Preprints are odd things. The version of the article that I downloaded at the time has since been superseded, and the current version of the article uploaded on March 31st carries a very different center of gravity, where they focus on interpolation and modeling hospitalizations, while the version from March 18th (available here) put the focus on developing a simple way to correct the report numbers of any country by comparing the demographics and death rate of that country with the demographics and death rate of a reference country. The word rate in “death rate” is operative here. The underlying idea is that the true death rates should be identical in any two countries that have a similar healthcare and a similar demographic distribution. 

Since death rate is the number of deaths divided by the number of cases, if we have an imperfect knowledge of cases, then our calculation of the death rate will be off. Same applies if we have an incomplete knowledge of deaths. As it turns out South Korea has been super good at administering Covid tests, providing great care to patients, and maintained solid records of cases and deaths, thus providing a really good bar.

The correction factor proposed by Lachmann et al is based on the average death rate of the reference country multiplied by the ratio of vulnerabilities of the corrected country and the reference country, where the vulnerability of each country is given as the sum sweeping through its citizens stratified by age multiplied by the mortality rates observed in people of that age. The equations describing these quantities and their relations are given in section C of version 1 of the paper, for those that are curious.

Fannnnnntastic, so now we have a way of correcting the unreported case numbers. Which brings us sort of close to how many cases could be detected in a given country, as long as their methodology (and overall Covid19 response) were as good as South Korea’s.

So next order of business, to account for other factors and adjust case numbers further and to unfuck the death numbers also, we’ll use a simple multiplier. This multiplier will not be a single point, but a Gaussian distribution, which means our estimates of corrected reports and deaths won’t be represented by one point per date either, but as entire collections of possible values. 

Correcting the Reports

Well, the next logical course of action is to throw together a shitty web-app that uses the methods described above from Lachmann et al and the Gaussian multiplier correction and immediately spit out a visual graph of the estimates.

Let’s take a look at some quick examples. Poland as a representative country that doesn’t include the (predominant) U07.2 cases and deaths in its guidelines:

Poland basics.png

~9.3k reported cases have been corrected to ~37.4k – non-trivial increase, but this number might seem relatively lower (by population proportion) than what we’ve seen in other countries. Again, we’re going to assume that only about a third of the deaths are reported in the first place and apply a statistical 3-fold multiplicative correction to both the deaths and cases in Poland (which for the cases might be actually conservative, given that we’re also missing the asymptomatic patients) and see the adjusted numbers:

Poland corrected via stats.png

Our cases have undergone a dramatic increase from ~37.4k projected cases to ~112.7k (95% CI: 77.4k – 146.7k), and from ~360 reported deaths to ~1,100 (95% CI: 740 – 1,450), so in each case the increase is approximately 3-fold, as expected from the settings that we used in our correction.

Stay safe, and remember… social distancing and wearing those face masks does make sense (and the numbers, especially when corrected, *are* showing it).

Trump is an utter imbecile. I view myself as a republican. Quit drinking the koolaid.

Much love and all that jazz <3

Leave a Reply

Your email address will not be published. Required fields are marked *