Guide to Creating New Extrapolation Models

This article will describe how you can use our open-source gitlab project to develop your own extrapolation models from the COVID-19 dataset.

We carefully chose our tools to make it fast & easy to get started developing your own models. It should take less than 10 minutes to:

  1. clone our repo,
  2. generate your first graph from our example models, and
  3. begin writing your own models in python
This guide was written under the assumption that the user is using Debian 10 (buster). It may apply to other Debian-based systems (eg Ubuntu) as well, but has not been tested.

If you encountered issues and managed to use our toolset on a non-Debian system, please open a ticket on our gitlab and we'll add it to the wiki

Bootstrap

Coviz requires python3 and python3-pip. It also requires plotly and numpy that will be installed from pip3.

Execute the following commands from a terminal to install these requirements

sudo apt-get -y install git python3-pip

# use --recurse-submodules so our <1M submodule repo with the CSV dataset is
# also fetched: https://gitlab.com/coviz-org/data-jhcsse
git clone --recurse-submodules git@gitlab.com:coviz-org/coviz-models.git

cd coviz-models
pip3 install requirements.txt

Generate graphs

You should now have everything you need to generate your first graph.

Execute the following commands from the coviz-models directory to generate the graphs using our example extrapolation models.

./generateGraphs.py
ls

The above ./generateGraphs.py should have created a new directory called output. Inside the output directory you'll see three more directories--one for each of the different extrapolation models:

user@coviz:~/sandbox/coviz-models$ ls -1 output/
my-new-model
projections-based-on-last-three-days
projections-based-on-last-seven-days
user@coviz:~/sandbox/coviz-models$ 

Inside of each of those models' directories, you'll an html file for each of the dataset's regions (ie: Afghanistan, US, Diamond Princess, etc).

user@coviz:~/sandbox/coviz-models$ ls output/projections-based-on-last-three-days/
us.html
user@coviz:~/sandbox/coviz-models$ 

By default (for faster execution), only the US graphs are created. If you'd like to generate graphs for additional regions, specify one or more regions with --region. If you'd like to generate the graphs for all the regions, use --earth.

Note: Using --earth will take a long time!

You can also specify --help for a list and description of all the options to generateGraphs.py.

./generateGraphs.py --help

Viewing Graphs

To view the graphs generated above from ./generateGraphs.py, open the html file output/<model>/<region>.html (eg output/projections-based-on-last-three-days/us.html) in your browser to see the graphs.

firefox output/projections-based-on-last-three-days/us.html

Creating your own model

If you'd like to create your own extraplation model, you can edit the models/my_new_model.py file to your liking.

You just need its make_extrapolate function to return a function that will take a single argument (the x value on the graph) and return another number (that x's cooresponding y value on the graph).

By default, the models/my_new_model.py script is just a copy of the e2a_seven extrapolation model, which does a simple curve fit against the most recent seven days of data using a second-degree polynomial with numpy's poly1d() function.

Let's change the models/my_new_model.py to do a curve fit against the most recent thirty days instead of seven.

Execute the following command to edit the file models/my_new_model.py in gedit.

gedit models/my_new_model.py

The first thing you'll notice is how short the script is! Python's numpy module is fantastic, and it does most of the heavy lifting. Here's the entire file.

import numpy

def make_extrapolate(data):
    x = [i for i in range(len(data))]
    y = data

    # fit exponential curve to last seven days of data 
    curve = numpy.polyfit(x[-7:], y[-7:], 2)

    # create function to be applied for extrapolation
    extrapolate = numpy.poly1d(curve)

    return extrapolate


def meta():
    return {
        'title': 'My New Model'
    }

The important line that does the curve-fit is this one:

curve = numpy.polyfit(x[-7:], y[-7:], 2)

You can find the python documentation on numpy's polyfit() function here.

  1. The first argument to the polyfit() function is x, which is a list of x coordinates
  2. The second argument to the polyfit() function is y, which is a list of y coordinates
  3. The third argument to the polyfit() function is deg, which is an int that defines the degree of the fitting polynonomial. In all our example models, we use a second-degree fit.

Go ahead and change this line to the following, which will increase the set of data passed to the polyfit() function from the most-recent 7 days of the data set to the most-recent 30 days of the dataset.

curve = numpy.polyfit(x[-30:], y[-30:], 2)

Save an close the file my_new_model.py, and re-generate the graphs.

./generateGraphs.py

Now open the output/my-new-model/us.html file in your browser.

firefox output/my-new-model/us.html

Your browser will now show you the second-degree polynomial curve fit changed to fit against the most-recent 30 days.

You can confirm this by looking at the difference between the output of the other two models

firefox output/projections-based-on-last-three-days/us.html
firefox output/projections-based-on-last-seven-days/us.html

make_extrapolate()

Now you can make whatever modifications you'd like to the my_new_model.py file's make_extrapolate() function and follow the above procedure to generate its graph and check the result in your web browser.

In fact, you're not constrained to using numpy. The only constraint is that your make_extrapolate() function should return a function that takes x values and returns y values. Simple, right?

Submitting Extrapolation Models

Did you build an awesome extrapolation model from this guide and want to share it with the world? Great!

You can submit a ticket on our gitlab group for this. Make sure to include:

  1. The python code to produce the model (ie: the contents of my_new_model.py)
  2. A human-readable name for the model (< 45 characters)
  3. A short id for the model (3-5 characters)
  4. A short description of the model (~1-10 sentences)
  5. A list of the pros of the model
  6. A list of the cons of the model
  7. A statement that by submitting their model to us, you are releasing the model copyleft under the CC-BY-SA license and its implementation code AGPLv3. You will need to state that [a] you are the original author and [b] that you permit anyone to use the model and any content produced by it under the terms of those licenses
  8. A list of the names of the authors & contributors for developing the model (optional)
  9. For each author/contributor, a hyperlink to a URL of their choice, such as a website or social media account for attribution (optional)

Please create a new ticket on our github with the above information, and we'll work on integrating the model into our website. Thank you!

Updating the Dataset

As the days pass, your data will become stale. The dataset should be updated once-a-day.

Execute the following command from the coviz-models directory to update the repo and its dataset submodule.

git submodule foreach git pull origin master
git pull

See Also

  1. Our coviz-models git repo on gitlab
  2. A list of our current models
  3. Our "e2a" model

Further Reading

  1. Curve Fitting (Wikipedia)
  2. 1918 Spanish Flu Pandemic (Wikipedia)
  3. Herd Immunity (Wikipedia)
  4. numpy's User Guide
  5. numpy's "polynomials" documentation
  6. numpy's "fitting polynomials" documentation