There is an immediate need for resource allocation towards the Covid-19 pandemic. While there has been a wealth of data and scientists working to analyze, diagnose, and predict the spread of the outbreak, there has yet to be a centralized repository for harmonized datasets, immediate analysis needs, and previously completed research outputs. Here we are creating exactly this repository, for a decentralized and open source data science collaboration to combat the COVID-19 pandemic.

Get Involved

Use your skills to analyze data, develop models, and organize results so that policymakers are better informed to make the correct decisions. Another important task is maintenance of existing datasets to keep them clean and accurate. As of March 18th 5:30 PM, the number one most used dataset has 466 issues that need to be addressed. Researchers also are in need of help. Find ways to support here.


Analyses and Data Products:

A dashboard with active case modeling, prediction, and visualization by country

A crowdsourced map of hospitals needing PPE



JHU CSSE: Daily case reports with time-series data 

Tweets dataset: Contains 15 million tweets with date, location, user ID, and a sentiment score 

Open research dataset: Contains over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses for natural language processing projects 

Kaggle recovery and exposure dataset: Contains an adapted version of CSSE data with start/end dates for exposure, symptom onset, age, gender, and case summaries 

School Closures dataset: Contains a map and table of all U.S. school closures 

Country-Specific datasets: Korea, Italy, Brazil, U.S.

covidtracking.com U.S. state by state tracking: Contains an API as well as historical time series data

Understanding America Study: Surveys of attitudes and behaviors around the Novel Coronavirus pandemic in the United States

County by county data

Chest X-Ray Dataset: Contains an image data collection from chest X-Ray and CT images


Suggested Projects

Epidemiology modeling

State by state or regional in focus instead of just national. Maybe even try to predict patient flows to specific hospitals, which is maybe the single most important output for planners

Fiscal response

Track if/how stimulus is or isn’t arriving at households e.g. changes in SNAP policy, sick leave, paid parental leave, etc. Track the allocation of dollars using supply side gov’t data. Track the receipt of dollars using household data and potentially social media.

Every policy idea is theoretical until it actually shows up in the surveys. This is essential for accountability and identifying bottlenecks / exclusion / etc.

Changes to learning environments and identifying the most at risk students

Where are the most vulnerable students (free reduced lunch, foster, etc.) among schools that have been shut?

Food/Medicine stock outs

Web crawl and social media scrape to have real time dashboard of food/medicine stock outs that could quickly shine a light for suppliers and policymakers

Predicting and monitoring false information

Predict types of people that don’t believe efficacy of social distancing, etc. in order to target public health messaging better. Monitor false information on social media.

A fast and anonymous sharing platform for ER doctors

ER docs need an anonymous and fast way to share information and findings. Traditional publishing pathways are too slow and twitter is too disorganized and not anonymized