2019-nCoV Data Prediction
Table of Contents
2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people. The latest situation summary updates are available on CDC’s web page 2019 Novel Coronavirus, Wuhan, China
from https://www.cdc.gov/coronavirus/2019-ncov/about/index.html
Lead
As everyone knows, the serious coronavirus is attacking our country especially in Wuhan while most activities are canceled. Staying at home become the daily routine. Besides working on learning stuff, I am trying to learn Python, a popular programming language which I should master long before.
After learning Matrix Linear Regression, an powerful and beginner-friendly algorithm used for predicting, I got an idea: forecasting the future trend by using a series of data including the number of people infected with virus. So I just started to do it and it’s time to share my computing results.
Function
The function of my program is to predict the next-day numbers of people confirmed, suspected, cured… in China by data of previous days.
Process
Collecting Data
There are tremendous available data on Github, and I chose a real-time updating excel files as the data sources including the number of people confirmed, suspected, severe, dead, cured and observed.
Analyzing Data
I use Pycharm
with sklearn
on local computer environment (Windows 10, 1903).
- The samples are numbers of people in different situation each day.
- The predictions are numbers of people next day
- The method is
multivariable linear regression
Results
Based on the data from January 21 to February 3, 2020, the following is my forecast for February 4
What does the data tell us
At least for the next few days, the number will still increasing at a serious rate. Please pay attention to safety, try to avoid going out for yourself and for this society.
武汉加油,中国加油!
Project Code
dataReader.py
def load_data(sample):
file = open("C://Users/19132/Desktop/data.txt")
i = 0
for line in file:
sample.extend([line.split()])
i += 1
return i
# print(sample)
sample = []
load_data(sample)
modelTrainer.py
from sklearn import linear_model
import dataReader
import modelTester
prediction = []
def process(data, sampleNum, n):
target = []
i = 1
while i < sampleNum:
target.append(data[i][n])
i += 1
# print(target)
reg = linear_model.LinearRegression()
reg.fit(data[0:-1], target)
# prediction.extend(reg.predict([[17205,21558,2296,361,475,152700]]))
prediction.extend(reg.predict([[20438,23214,2788,425,632,171329]]))
data = []
num = dataReader.load_data(data)
i = 0
while i < 6:
process(data, num, i)
i += 1
modelTester.testing(prediction)
modelTester.py
def testing(prediction):
print("processing...")
print(format("confirmed", "<17"), end="\t")
print(format("suspected", "<17"), end="\t")
print(format("severe", "<17"), end="\t")
print(format("death", "<17"), end="\t")
print(format("cures", "<17"), end="\t")
print(format("observation", "<17"))
for x in prediction:
print(format(int(x), "<17"), end="\t")
print("\ndone!")
graphGenerator.py
import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import random
import numpy as np
def draw_scatter(x, y):
plt.scatter(x, y)
# plt.xlabel('Heights')
# plt.ylabel('Weight')
# plt.title('Heights & Weight of Students')
plt.show()
def draw_3d_scatter(x, y, z):
fig = plt.figure()
ax = fig.add_subplot(111, projection="3d")
ax.scatter(x, y, z, c='r', marker='o')
ax.set_xlabel('xlabel')
ax.set_ylabel('ylabel')
ax.set_zlabel('zlabel')
plt.show()
def draw_3d_line(x, y, z):
fig = plt.figure()
ax = fig.gca(projection='3d')
ax.plot(x, y, z, label='parametric curve')
ax.legend()
plt.show()
Data Sources https://github.com/JackieZheng/2019-nCoV