Skip to main content

2019-nCoV Data Prediction

February 4, 2020   

2019 Novel Coronavirus (2019-nCoV) is a virus (more specifically, a coronavirus identified as the cause of an outbreak of respiratory illness first detected in Wuhan, China. Early on, many of the patients in the outbreak in Wuhan, China reportedly had some link to a large seafood and animal market, suggesting animal-to-person spread. However, a growing number of patients reportedly have not had exposure to animal markets, indicating person-to-person spread is occurring. At this time, it’s unclear how easily or sustainably this virus is spreading between people. The latest situation summary updates are available on CDC’s web page 2019 Novel Coronavirus, Wuhan, China



As everyone knows, the serious coronavirus is attacking our country especially in Wuhan while most activities are canceled. Staying at home become the daily routine. Besides working on learning stuff, I am trying to learn Python, a popular programming language which I should master long before.

After learning Matrix Linear Regression, an powerful and beginner-friendly algorithm used for predicting, I got an idea: forecasting the future trend by using a series of data including the number of people infected with virus. So I just started to do it and it’s time to share my computing results.


The function of my program is to predict the next-day numbers of people confirmed, suspected, cured… in China by data of previous days.


Collecting Data

There are tremendous available data on Github, and I chose a real-time updating excel files as the data sources including the number of people confirmed, suspected, severe, dead, cured and observed.

Analyzing Data

I use Pycharm with sklearnon local computer environment (Windows 10, 1903).

  • The samples are numbers of people in different situation each day.
  • The predictions are numbers of people next day
  • The method is multivariable linear regression


Based on the data from January 21 to February 3, 2020, the following is my forecast for February 4

What does the data tell us

At least for the next few days, the number will still increasing at a serious rate. Please pay attention to safety, try to avoid going out for yourself and for this society.


Project Code

def load_data(sample):
    file = open("C://Users/19132/Desktop/data.txt")
    i = 0
    for line in file:
        i += 1
    return i
    # print(sample)

sample = []

from sklearn import linear_model
import dataReader
import modelTester

prediction = []

def process(data, sampleNum, n):
    target = []
    i = 1
    while i < sampleNum:
        i += 1

    # print(target)

    reg = linear_model.LinearRegression()[0:-1], target)

    # prediction.extend(reg.predict([[17205,21558,2296,361,475,152700]]))

data = []
num = dataReader.load_data(data)

i = 0
while i < 6:
    process(data, num, i)
    i += 1


def testing(prediction):
    print(format("confirmed", "<17"), end="\t")
    print(format("suspected", "<17"), end="\t")
    print(format("severe", "<17"), end="\t")
    print(format("death", "<17"), end="\t")
    print(format("cures", "<17"), end="\t")
    print(format("observation", "<17"))

    for x in prediction:
        print(format(int(x), "<17"), end="\t")

import matplotlib.pyplot as plt
from mpl_toolkits.mplot3d import Axes3D
import random
import numpy as np

def draw_scatter(x, y):
    plt.scatter(x, y)
    # plt.xlabel('Heights')
    # plt.ylabel('Weight')
    # plt.title('Heights & Weight of Students')

def draw_3d_scatter(x, y, z):
    fig = plt.figure()
    ax = fig.add_subplot(111, projection="3d")
    ax.scatter(x, y, z, c='r', marker='o')

def draw_3d_line(x, y, z):
    fig = plt.figure()
    ax = fig.gca(projection='3d')
    ax.plot(x, y, z, label='parametric curve')

Data Sources