COVID-19 analysis you can explore by yourself. Figure this out!
Everyone is talking about COVID-19 but no one knows the future. In this post, I will provide free-of-charge data, tools, and ideas to analyze the current situation so that we can prepare the future for ourselves.
To see what is going on, we need a reliable dataset and appropriate tools. I am going to use the WHO (World Health Organization) dataset and Colab provided by Google. Colab is a free-of-charge Python environment. All you need is an internet-connected computer and a web browser.
import pandas as pddf = pd.read_csv("https://covid19.who.int/WHO-COVID-19-global-data.csv")df
WHO maintains the up-to-dated COVID-19 dataset in the CSV format. Don’t worry about the format. It’s just a certain format.
Now I see 26476 rows × 8 columns dataset. Probably when you read this post, the number of rows will be larger since everyday data points will be added.
df.nunique()
Since we named our dataset as df
, we can see the unique number of items in the dataset.
Date_reported 178
Country_code 215
Country 216
WHO_region 7
New_cases 2440
Cumulative_cases 8715
New_deaths 616
Cumulative_deaths 2918
dtype: int64
One thing to note here is that for some reasons there are spaces in front of the variable names except the first one (Data_reported
). Be aware.
One more weird thing is that the number of Country_code
does not match the number of Country
. You can figure it out by trying some codes like
df[' Country_code'].unique()
Now here is a big part.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.animation import FuncAnimationfig, ax = plt.subplots()y = []
x = []
ln, = plt.plot([], [], 'r.')def init():
ax.set_xlim(1, 10000000)
ax.set_ylim(1, 100000)
ax.set_xscale('log')
ax.set_yscale('log')
ax.set_xlabel('Cumulative cases (log scale)')
ax.set_ylabel('New cases (log scale)')
return ln,def update(frame):
xdata=df[df[' Country_code']=='US'][' Cumulative_cases'][:frame+1]
ydata=df[df[' Country_code']=='US'][' New_cases'][:frame+1]
ln.set_data(xdata, ydata)
return ln,ani = FuncAnimation(fig, update, frames=range(len(df[df[' Country_code']=='US'])), init_func=init, blit=True)ani.save("test.mp4")
It looks long, but actually this is just a slight modification of the animation example in the Python matplotlib package. All you have to change the Country_code
variable to your country code. For example, when I change to South KoreaKR
.
As you can see, the shape is totally different.
The graphs that I have shown you have hidden ideas. The following youtube explains the ideas.
For readers who have short of time to check the above video, we would like to provide simple outlines.
First and foremost, we use a log-scale on both the x-axis and the y-axis to visualize the data. As the log-scale lets us grasp the rate of change easily, this is a widely used method in drawing graphs (such as plotting the stock price change over a long period of time and so forth).
Second, in the graph, the x-axis is not ‘time’ which is against our common habit. Actually, the x-axis is about the total number of confirmed cases and the y-axis, the newly confirmed cases in the following week. Therefore, time is only shown through the movement of the data point of each country.
Third, even though it is neat and easy to plot data on a log scale, the method has some caveats. For example, log-scaled graphs present the distance between 10K to 100K as the same from 10 to 100. Thereby, unless we acknowledge the pros and cons of this method, it would hinder us from getting the correct facts out of the data.
The above short video provides more details and intuitive explanations, so we highly recommend you to take a look!
Hope you enjoy the reading and try it for your country. Stay healthy.