Guide to Creating New Extrapolation Models
This article will describe how you can use our open-source gitlab project to develop your own extrapolation models from the COVID-19 dataset.
We carefully chose our tools to make it fast & easy to get started developing your own models. It should take less than 10 minutes to:
- clone our repo …
This article will describe how you can use our open-source gitlab project to develop your own extrapolation models from the COVID-19 dataset.
We carefully chose our tools to make it fast & easy to get started developing your own models. It should take less than 10 minutes to:
- clone our repo,
- generate your first graph from our example models, and
- begin writing your own models in python
If you encountered issues and managed to use our toolset on a non-Debian system, please open a ticket on our gitlab and we'll add it to the wiki
Bootstrap
Coviz requires python3
and python3-pip
. It also requires plotly
and numpy
that will be installed from pip3
.
Execute the following commands from a terminal to install these requirements
sudo apt-get -y install git python3-pip # use --recurse-submodules so our <1M submodule repo with the CSV dataset is # also fetched: https://gitlab.com/coviz-org/data-jhcsse git clone --recurse-submodules git@gitlab.com:coviz-org/coviz-models.git cd coviz-models pip3 install requirements.txt
Generate graphs
You should now have everything you need to generate your first graph.
Execute the following commands from the coviz-models
directory to generate the graphs using our example extrapolation models.
./generateGraphs.py ls
The above ./generateGraphs.py
should have created a new directory called output
. Inside the output
directory you'll see three more directories--one for each of the different extrapolation models:
user@coviz:~/sandbox/coviz-models$ ls -1 output/ my-new-model projections-based-on-last-three-days projections-based-on-last-seven-days user@coviz:~/sandbox/coviz-models$
Inside of each of those models' directories, you'll an html file for each of the dataset's regions (ie: Afghanistan, US, Diamond Princess, etc).
user@coviz:~/sandbox/coviz-models$ ls output/projections-based-on-last-three-days/ us.html user@coviz:~/sandbox/coviz-models$
By default (for faster execution), only the US graphs are created. If you'd like to generate graphs for additional regions, specify one or more regions with --region
. If you'd like to generate the graphs for all the regions, use --earth
.
--earth
will take a long time!
You can also specify --help
for a list and description of all the options to generateGraphs.py
.
./generateGraphs.py --help
Viewing Graphs
To view the graphs generated above from ./generateGraphs.py
, open the html file output/<model>/<region>.html
(eg output/projections-based-on-last-three-days/us.html
) in your browser to see the graphs.
firefox output/projections-based-on-last-three-days/us.html
Creating your own model
If you'd like to create your own extraplation model, you can edit the models/my_new_model.py
file to your liking.
You just need its make_extrapolate
function to return a function that will take a single argument (the x value on the graph) and return another number (that x's cooresponding y value on the graph).
By default, the models/my_new_model.py
script is just a copy of the e2a_seven
extrapolation model, which does a simple curve fit against the most recent seven days of data using a second-degree polynomial with numpy's poly1d()
function.
Let's change the models/my_new_model.py
to do a curve fit against the most recent thirty days instead of seven.
Execute the following command to edit the file models/my_new_model.py
in gedit
.
gedit models/my_new_model.py
The first thing you'll notice is how short the script is! Python's numpy
module is fantastic, and it does most of the heavy lifting. Here's the entire file.
import numpy def make_extrapolate(data): x = [i for i in range(len(data))] y = data # fit exponential curve to last seven days of data curve = numpy.polyfit(x[-7:], y[-7:], 2) # create function to be applied for extrapolation extrapolate = numpy.poly1d(curve) return extrapolate def meta(): return { 'title': 'My New Model' }
The important line that does the curve-fit is this one:
curve = numpy.polyfit(x[-7:], y[-7:], 2)
You can find the python documentation on numpy's polyfit()
function here.
- The first argument to the
polyfit()
function isx
, which is a list of x coordinates - The second argument to the
polyfit()
function isy
, which is a list of y coordinates - The third argument to the
polyfit()
function isdeg
, which is anint
that defines the degree of the fitting polynonomial. In all our example models, we use a second-degree fit.
Go ahead and change this line to the following, which will increase the set of data passed to the polyfit()
function from the most-recent 7 days of the data set to the most-recent 30 days of the dataset.
curve = numpy.polyfit(x[-30:], y[-30:], 2)
Save an close the file my_new_model.py
, and re-generate the graphs.
./generateGraphs.py
Now open the output/my-new-model/us.html
file in your browser.
firefox output/my-new-model/us.html
Your browser will now show you the second-degree polynomial curve fit changed to fit against the most-recent 30 days.
You can confirm this by looking at the difference between the output of the other two models
firefox output/projections-based-on-last-three-days/us.html firefox output/projections-based-on-last-seven-days/us.html
make_extrapolate()
Now you can make whatever modifications you'd like to the my_new_model.py
file's make_extrapolate()
function and follow the above procedure to generate its graph and check the result in your web browser.
In fact, you're not constrained to using numpy
. The only constraint is that your make_extrapolate()
function should return a function that takes x
values and returns y
values. Simple, right?
Submitting Extrapolation Models
Did you build an awesome extrapolation model from this guide and want to share it with the world? Great!
You can submit a ticket on our gitlab group for this. Make sure to include:
- The python code to produce the model (ie: the contents of
my_new_model.py
) - A human-readable name for the model (< 45 characters)
- A short id for the model (3-5 characters)
- A short description of the model (~1-10 sentences)
- A list of the pros of the model
- A list of the cons of the model
- A statement that by submitting their model to us, you are releasing the model copyleft under the CC-BY-SA license and its implementation code AGPLv3. You will need to state that [a] you are the original author and [b] that you permit anyone to use the model and any content produced by it under the terms of those licenses
- A list of the names of the authors & contributors for developing the model (optional)
- For each author/contributor, a hyperlink to a URL of their choice, such as a website or social media account for attribution (optional)
Please create a new ticket on our github with the above information, and we'll work on integrating the model into our website. Thank you!
Updating the Dataset
As the days pass, your data will become stale. The dataset should be updated once-a-day.
Execute the following command from the coviz-models
directory to update the repo and its dataset submodule.
git submodule foreach git pull origin master git pull