Summary

The notebook

Hi, this notebook page was written to analyse the 2019-2020 pandemic outbreak in Europe of the SARS-CoV-2 virus.

It was written to:

  • provide the current status of the outbreak from the authorities;
  • present these status with interactive charts;
  • write a small library to analyse and predict the plateau and duration of the disease;
  • add content to my own personal website with something new;

The disease

The disease COVID-19 is caused by the virus SARS-CoV-2 (Severe Accute Respiratory Syndrome 2).

Your help

This notebook served me to practice some older skills and eventually reach out to an interested audience.
In case you have a suggestion on how to improve the usefulness of the notebook, I will be thankful ( nuno.aja@gmail.com ).

Kind regards, The author.

NOTICE: ongoing work


Section I

In this section we focus on getting the current datasets, process and load them.

For convenience and security we implemented the adequate code to present the data.


In [1]:
import modules_loader
from modules.analytics.sars_cov2_2019_20 import analytics
In [2]:
analysis = analytics.scenario()
available_countries = analysis.download_source_datafiles(force = True)
In [3]:
target_cases = analysis.process_dataset(available_countries)
Preparing dataset for 188 coutries.

Section II

In this section we focus on the current data exploration.


Selecting the (20) countries with most active cases

We now select countries by active cases number and plot the resulting data on a bar chart.

In [4]:
top_20_countries_active_cases = analysis.show_top_cases(20, feature = 'Active')

The 20 countries with most active cases

Following with the data for these selected countries.

In [5]:
display(analysis.statistics_by_province.loc[list(top_20_countries_active_cases.index)].sort_values(by='Active', ascending=False))
Active Infected Recovered Deaths
Country Province
US 1085462 1417774 246414 85898
United Kingdom 199537 233151 0 33614
Russia 196410 252245 53530 2305
Brazil 109687 203165 79479 13999
France 91031 176712 58300 27381
Italy 76440 223096 115288 31368
Spain 58845 229540 143374 27321
Peru 53186 80604 25151 2267
India 51379 81997 27969 2649
Netherlands 37891 43481 0 5590
Canada Quebec 37380 40732 0 3352
Turkey 36712 144749 104030 4007
Belgium 31274 54288 14111 8903
Saudi Arabia 27535 46869 19051 283
Pakistan 25323 35788 9695 770
Qatar 24902 28272 3356 14
Ecuador 24731 30502 3433 2338
Portugal 23937 28319 3198 1184
Chile 21017 37040 15655 368
Canada Ontario 20950 22865 0 1915
Singapore 20104 26098 5973 21
Canada Alberta 6336 6457 0 121
British Columbia 2257 2392 0 135
Nova Scotia 975 1026 0 51
Saskatchewan 575 582 0 7
France Mayotte 567 1210 627 16
Canada Manitoba 282 289 0 7
Newfoundland and Labrador 258 261 0 3
New Brunswick 120 120 0 0
France Reunion 86 440 354 0
Martinique 84 189 91 14
United Kingdom Channel Islands 50 549 456 43
Bermuda 47 122 66 9
France French Guiana 39 164 124 1
United Kingdom Cayman Islands 38 93 54 1
France Guadeloupe 33 155 109 13
Canada Prince Edward Island 27 27 0 0
United Kingdom Isle of Man 24 332 285 23
Netherlands Sint Maarten 15 76 46 15
Canada Grand Princess 13 13 0 0
Yukon 11 11 0 0
France St Martin 6 39 30 3
Netherlands Aruba 5 101 93 3
Canada Northwest Territories 5 5 0 0
United Kingdom Gibraltar 3 147 144 0
British Virgin Islands 2 7 4 1
Montserrat 2 11 8 1
France French Polynesia 1 60 59 0
United Kingdom Turks and Caicos Islands 1 12 10 1
Netherlands Curacao 1 16 14 1
France Saint Barthelemy 0 6 6 0
New Caledonia 0 18 18 0
United Kingdom Falkland Islands (Malvinas) 0 13 13 0
Anguilla 0 3 3 0
Netherlands Bonaire, Sint Eustatius and Saba 0 6 6 0
France Saint Pierre and Miquelon 0 1 1 0
In [6]:
# df = analysis.target_cases.copy()

# pdf = analysis.world_population.copy()
# pdf.head(5)
# pdf.describe().transpose()
# pdf.loc['Canada']
# pdf.loc['United States of America']
# pdf.loc['Portugal']
In [7]:
analysis.display_locations(countries = 'China',
                           provinces = 'Henan',
                           fill = False,
                           logy = False)

raise NotImplementedError('take a pause')



Active Infected Recovered Deaths
Country Province
China Henan 0 1276 1254 22
---------------------------------------------------------------------------
NotImplementedError                       Traceback (most recent call last)
<ipython-input-7-6ad6528928bc> in <module>
      4                            logy = False)
      5 
----> 6 raise NotImplementedError('take a pause')

NotImplementedError: take a pause
In [9]:
df = target_cases.loc[ 'Portugal', '', :, :, : ].reset_index().set_index([
    'Country',
    'Province', 
    'Date'
])
df.describe().transpose()
df
Out[9]:
count mean std min 25% 50% 75% max
Lat 74 39.3999 0 39.3999 39.3999 39.3999 39.3999 39.3999
Long 74 -8.2245 1.78848e-15 -8.2245 -8.2245 -8.2245 -8.2245 -8.2245
Infected 74 12894.7 10590 2 1085 12791.5 23746 28319
Recovered 74 689.946 922.488 0 5 190 1316 3198
Deaths 74 462.959 437.603 0 7.5 362.5 897.25 1184
Active 74 11741.8 9379.61 2 1072.5 12239 21532.8 23986
index_Date 74 36 days 12:00:00 21 days 12:08:22.257681 0 days 00:00:00 18 days 06:00:00 36 days 12:00:00 54 days 18:00:00 73 days 00:00:00
new_Active 74 323.446 312.25 -249 61 276.5 536.5 1462
new_Infected 74 382.662 315.842 -161 102.75 343 601.75 1516
new_Recovered 74 43.2162 83.285 0 0 10 47.75 464
new_Deaths 74 16 12.0148 0 2.25 16.5 26 37
new_Active_pct 74 0.156373 0.255144 -0.0103811 0.00758179 0.0343943 0.210151 1.5
estimate_Active 74 0 0 0 0 0 0 0
new_Infected_pct 74 0.158803 0.25457 -0.00635083 0.012233 0.0371289 0.209577 1.5
estimate_Infected 74 0 0 0 0 0 0 0
new_Recovered_pct 74 inf NaN 0 0 0.0205382 0.103545 inf
estimate_Recovered 74 0 0 0 0 0 0 0
new_Deaths_pct 74 inf NaN 0 0.00868813 0.0335898 0.109249 inf
estimate_Deaths 74 0 0 0 0 0 0 0
Out[9]:
Lat Long Infected Recovered Deaths Active index_Date new_Active new_Infected new_Recovered new_Deaths new_Active_pct estimate_Active new_Infected_pct estimate_Infected new_Recovered_pct estimate_Recovered new_Deaths_pct estimate_Deaths
Country Province Date
Portugal 2020-03-02 39.3999 -8.2245 2 0 0 2 0 days 0.0 0.0 0.0 0.0 0.000000 0 0.000000 0 0.000000 0 0.000000 0
2020-03-03 39.3999 -8.2245 2 0 0 2 1 days 0.0 0.0 0.0 0.0 0.000000 0 0.000000 0 0.000000 0 0.000000 0
2020-03-04 39.3999 -8.2245 5 0 0 5 2 days 3.0 3.0 0.0 0.0 1.500000 0 1.500000 0 0.000000 0 0.000000 0
2020-03-05 39.3999 -8.2245 8 0 0 8 3 days 3.0 3.0 0.0 0.0 0.600000 0 0.600000 0 0.000000 0 0.000000 0
2020-03-06 39.3999 -8.2245 13 0 0 13 4 days 5.0 5.0 0.0 0.0 0.625000 0 0.625000 0 0.000000 0 0.000000 0
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
2020-05-10 39.3999 -8.2245 27581 2549 1135 23897 69 days 116.0 175.0 50.0 9.0 0.004878 0 0.006385 0 0.020008 0 0.007993 0
2020-05-11 39.3999 -8.2245 27679 2549 1144 23986 70 days 89.0 98.0 0.0 9.0 0.003724 0 0.003553 0 0.000000 0 0.007930 0
2020-05-12 39.3999 -8.2245 27913 3013 1163 23737 71 days -249.0 234.0 464.0 19.0 -0.010381 0 0.008454 0 0.182032 0 0.016608 0
2020-05-13 39.3999 -8.2245 28132 3182 1175 23775 72 days 38.0 219.0 169.0 12.0 0.001601 0 0.007846 0 0.056090 0 0.010318 0
2020-05-14 39.3999 -8.2245 28319 3198 1184 23937 73 days 162.0 187.0 16.0 9.0 0.006814 0 0.006647 0 0.005028 0 0.007660 0

74 rows × 19 columns

In [11]:
# 
# 
train_dataset = df.sample(frac=0.8,random_state=0)
test_dataset = df.drop(train_dataset.index)

train_dataset
Out[11]:
Lat Long Infected Recovered Deaths Active index_Date new_Active new_Infected new_Recovered new_Deaths new_Active_pct estimate_Active new_Infected_pct estimate_Infected new_Recovered_pct estimate_Recovered new_Deaths_pct estimate_Deaths
Country Province Date
Portugal 2020-04-21 39.3999 -8.2245 21379 917 762 19700 50 days 182.0 516.0 307.0 27.0 0.009325 0 0.024733 0 0.503279 0 0.036735 0
2020-04-25 39.3999 -8.2245 23392 1277 880 21235 54 days 520.0 595.0 49.0 26.0 0.025103 0 0.026100 0 0.039902 0 0.030445 0
2020-03-24 39.3999 -8.2245 2362 22 33 2307 22 days 275.0 302.0 17.0 10.0 0.135335 0 0.146602 0 3.400000 0 0.434783 0
2020-04-14 39.3999 -8.2245 17448 347 567 16534 43 days 412.0 514.0 70.0 32.0 0.025555 0 0.030353 0 0.252708 0 0.059813 0
2020-03-30 39.3999 -8.2245 6408 43 140 6225 28 days 425.0 446.0 0.0 21.0 0.073276 0 0.074807 0 0.000000 0 0.176471 0
2020-03-28 39.3999 -8.2245 5170 43 100 5027 26 days 878.0 902.0 0.0 24.0 0.211617 0 0.211340 0 0.000000 0 0.315789 0
2020-05-06 39.3999 -8.2245 26182 2076 1089 23017 65 days 132.0 480.0 333.0 15.0 0.005768 0 0.018676 0 0.191050 0 0.013966 0
2020-04-22 39.3999 -8.2245 21982 1143 785 20054 51 days 354.0 603.0 226.0 23.0 0.017970 0 0.028205 0 0.246456 0 0.030184 0
2020-04-05 39.3999 -8.2245 11278 75 295 10908 34 days 725.0 754.0 0.0 29.0 0.071197 0 0.071646 0 0.000000 0 0.109023 0
2020-03-09 39.3999 -8.2245 30 0 0 30 7 days 0.0 0.0 0.0 0.0 0.000000 0 0.000000 0 0.000000 0 0.000000 0
2020-04-13 39.3999 -8.2245 16934 277 535 16122 42 days 318.0 349.0 0.0 31.0 0.020121 0 0.021043 0 0.000000 0 0.061508 0
2020-04-27 39.3999 -8.2245 24027 1357 928 21742 56 days 110.0 163.0 28.0 25.0 0.005085 0 0.006830 0 0.021068 0 0.027685 0
2020-04-11 39.3999 -8.2245 15987 266 470 15251 40 days 447.0 515.0 33.0 35.0 0.030195 0 0.033286 0 0.141631 0 0.080460 0
2020-05-02 39.3999 -8.2245 25190 1671 1023 22496 61 days -201.0 -161.0 24.0 16.0 -0.008856 0 -0.006351 0 0.014572 0 0.015889 0
2020-04-04 39.3999 -8.2245 10524 75 266 10183 33 days 611.0 638.0 7.0 20.0 0.063832 0 0.064536 0 0.102941 0 0.081301 0
2020-05-07 39.3999 -8.2245 26715 2258 1105 23352 66 days 335.0 533.0 182.0 16.0 0.014554 0 0.020357 0 0.087669 0 0.014692 0
2020-03-29 39.3999 -8.2245 5962 43 119 5800 27 days 773.0 792.0 0.0 19.0 0.153770 0 0.153191 0 0.000000 0 0.190000 0
2020-04-19 39.3999 -8.2245 20206 610 714 18882 48 days 494.0 521.0 0.0 27.0 0.026865 0 0.026467 0 0.000000 0 0.039301 0
2020-04-28 39.3999 -8.2245 24322 1389 948 21985 57 days 243.0 295.0 32.0 20.0 0.011177 0 0.012278 0 0.023581 0 0.021552 0
2020-04-24 39.3999 -8.2245 22797 1228 854 20715 53 days 383.0 444.0 27.0 34.0 0.018837 0 0.019863 0 0.022481 0 0.041463 0
2020-03-08 39.3999 -8.2245 30 0 0 30 6 days 10.0 10.0 0.0 0.0 0.500000 0 0.500000 0 0.000000 0 0.000000 0
2020-04-20 39.3999 -8.2245 20863 610 735 19518 49 days 636.0 657.0 0.0 21.0 0.033683 0 0.032515 0 0.000000 0 0.029412 0
2020-05-10 39.3999 -8.2245 27581 2549 1135 23897 69 days 116.0 175.0 50.0 9.0 0.004878 0 0.006385 0 0.020008 0 0.007993 0
2020-03-06 39.3999 -8.2245 13 0 0 13 4 days 5.0 5.0 0.0 0.0 0.625000 0 0.625000 0 0.000000 0 0.000000 0
2020-04-26 39.3999 -8.2245 23864 1329 903 21632 55 days 397.0 472.0 52.0 23.0 0.018696 0 0.020178 0 0.040720 0 0.026136 0
2020-04-30 39.3999 -8.2245 25045 1519 989 22537 59 days 475.0 540.0 49.0 16.0 0.021530 0 0.022036 0 0.033333 0 0.016444 0
2020-03-04 39.3999 -8.2245 5 0 0 5 2 days 3.0 3.0 0.0 0.0 1.500000 0 1.500000 0 0.000000 0 0.000000 0
2020-05-14 39.3999 -8.2245 28319 3198 1184 23937 73 days 162.0 187.0 16.0 9.0 0.006814 0 0.006647 0 0.005028 0 0.007660 0
2020-04-16 39.3999 -8.2245 18841 493 629 17719 45 days 610.0 750.0 110.0 30.0 0.035654 0 0.041457 0 0.287206 0 0.050083 0
2020-03-13 39.3999 -8.2245 112 1 0 111 11 days 52.0 53.0 1.0 0.0 0.881356 0 0.898305 0 inf 0 0.000000 0
2020-04-01 39.3999 -8.2245 8251 43 187 8021 30 days 781.0 808.0 0.0 27.0 0.107873 0 0.108558 0 0.000000 0 0.168750 0
2020-03-05 39.3999 -8.2245 8 0 0 8 3 days 3.0 3.0 0.0 0.0 0.600000 0 0.600000 0 0.000000 0 0.000000 0
2020-05-01 39.3999 -8.2245 25351 1647 1007 22697 60 days 160.0 306.0 128.0 18.0 0.007099 0 0.012218 0 0.084266 0 0.018200 0
2020-03-12 39.3999 -8.2245 59 0 0 59 10 days 0.0 0.0 0.0 0.0 0.000000 0 0.000000 0 0.000000 0 0.000000 0
2020-04-02 39.3999 -8.2245 9034 68 209 8757 31 days 736.0 783.0 25.0 22.0 0.091759 0 0.094898 0 0.581395 0 0.117647 0
2020-05-03 39.3999 -8.2245 25282 1689 1043 22550 62 days 54.0 92.0 18.0 20.0 0.002400 0 0.003652 0 0.010772 0 0.019550 0
2020-04-23 39.3999 -8.2245 22353 1201 820 20332 52 days 278.0 371.0 58.0 35.0 0.013863 0 0.016877 0 0.050744 0 0.044586 0
2020-05-04 39.3999 -8.2245 25524 1712 1063 22749 63 days 199.0 242.0 23.0 20.0 0.008825 0 0.009572 0 0.013618 0 0.019175 0
2020-04-03 39.3999 -8.2245 9886 68 246 9572 32 days 815.0 852.0 0.0 37.0 0.093068 0 0.094310 0 0.000000 0 0.177033 0
2020-03-16 39.3999 -8.2245 331 3 0 328 14 days 85.0 86.0 1.0 0.0 0.349794 0 0.351020 0 0.500000 0 0.000000 0
2020-04-12 39.3999 -8.2245 16585 277 504 15804 41 days 553.0 598.0 11.0 34.0 0.036260 0 0.037405 0 0.041353 0 0.072340 0
2020-03-21 39.3999 -8.2245 1280 5 12 1263 19 days 254.0 260.0 0.0 6.0 0.251734 0 0.254902 0 0.000000 0 1.000000 0
2020-03-31 39.3999 -8.2245 7443 43 160 7240 29 days 1015.0 1035.0 0.0 20.0 0.163052 0 0.161517 0 0.000000 0 0.142857 0
2020-05-12 39.3999 -8.2245 27913 3013 1163 23737 71 days -249.0 234.0 464.0 19.0 -0.010381 0 0.008454 0 0.182032 0 0.016608 0
2020-04-06 39.3999 -8.2245 11730 140 311 11279 35 days 371.0 452.0 65.0 16.0 0.034012 0 0.040078 0 0.866667 0 0.054237 0
2020-03-20 39.3999 -8.2245 1020 5 6 1009 18 days 230.0 235.0 2.0 3.0 0.295250 0 0.299363 0 0.666667 0 1.000000 0
2020-03-02 39.3999 -8.2245 2 0 0 2 0 days 0.0 0.0 0.0 0.0 0.000000 0 0.000000 0 0.000000 0 0.000000 0
2020-05-13 39.3999 -8.2245 28132 3182 1175 23775 72 days 38.0 219.0 169.0 12.0 0.001601 0 0.007846 0 0.056090 0 0.010318 0
2020-03-17 39.3999 -8.2245 448 3 1 444 15 days 116.0 117.0 0.0 1.0 0.353659 0 0.353474 0 0.000000 0 inf 0
2020-03-07 39.3999 -8.2245 20 0 0 20 5 days 7.0 7.0 0.0 0.0 0.538462 0 0.538462 0 0.000000 0 0.000000 0
2020-03-18 39.3999 -8.2245 448 3 2 443 16 days -1.0 0.0 0.0 1.0 -0.002252 0 0.000000 0 0.000000 0 1.000000 0
2020-03-22 39.3999 -8.2245 1600 5 14 1581 20 days 318.0 320.0 0.0 2.0 0.251781 0 0.250000 0 0.000000 0 0.166667 0
2020-05-09 39.3999 -8.2245 27406 2499 1126 23781 68 days 49.0 138.0 77.0 12.0 0.002065 0 0.005061 0 0.031792 0 0.010772 0
2020-03-10 39.3999 -8.2245 41 0 0 41 8 days 11.0 11.0 0.0 0.0 0.366667 0 0.366667 0 0.000000 0 0.000000 0
2020-03-15 39.3999 -8.2245 245 2 0 243 13 days 76.0 76.0 0.0 0.0 0.455090 0 0.449704 0 0.000000 0 0.000000 0
2020-03-27 39.3999 -8.2245 4268 43 76 4149 25 days 708.0 724.0 0.0 16.0 0.205754 0 0.204289 0 0.000000 0 0.266667 0
2020-04-08 39.3999 -8.2245 13141 196 380 12565 37 days 652.0 699.0 12.0 35.0 0.054730 0 0.056181 0 0.065217 0 0.101449 0
2020-03-19 39.3999 -8.2245 785 3 3 779 17 days 336.0 337.0 0.0 1.0 0.758465 0 0.752232 0 0.000000 0 0.500000 0
2020-03-26 39.3999 -8.2245 3544 43 60 3441 24 days 511.0 549.0 21.0 17.0 0.174403 0 0.183306 0 0.954545 0 0.395349 0
In [ ]:
analysis.display_locations(countries = 'China',
                           provinces = 'Henan',
                           fill = False,
                           logy = False)
In [ ]:
raise NotImplementedError('Just to pause the execution')

Plotting the cases by day

We now plot the data for some countries using a composition of features.

In [ ]:
analysis.display_locations(countries = [ 'Italy', 'Portugal', 'Spain' ],
                           fill = False,
                           logy = False)

Interactive Sunburst Pie Charts

Now we present the interactive maps where you can select the region/subregion of the cases for more details.

In [ ]:
analysis.display_sunburst_chart(label = 'Active')
In [ ]:
analysis.display_sunburst_chart(label = 'Recovered')
In [ ]:
analysis.display_sunburst_chart(label = 'Deaths')

Querying dataset

We now execute some queries to our dataset for countries with most recoveries and over 2k infections with additional locations.

In [ ]:
interesting_locations = {
    ('United Kingdom', ''),
    ('Portugal', ''),
    ('Brazil', ''),
    ('Spain', ''),
    ('US', ''),
}
analysis.display_locations(
    locations = interesting_locations,
    query='Infected > 50000 & (Active < Recovered | Deaths > Recovered)',
    provinces = False
)

Selected Countries and Locations

Now we present the additional charts of the cases around the world.

In [ ]:
analysis.display_locations(countries = 'Australia', provinces = True)

Animated Geographic Map

Now we present the animation of the cases around the world since January of 2020.

In [ ]:
# Generate an animated geographic map
analysis.display_geomap()

Final Remarks

The author

Nuno André Jeremias de Aniceto is a Technology Consultant with experience in Software Engineering; Software Architecture and DevOps.
Holds a Master degree in Computer Science Engineering with focus on Computer Vision; Big Data; Multimedia and 3D Simulations.
Has specializations on Deep Learning and on Data Engineering on Google Cloud Platform.

The source of the data

The datasets are compiled by the Johns Hopkins University and the datasources themselves may present some issues (such as Canada province "Recovered").

As of 2020-03-28 the datasources are:

References