There is an immediate need for resource allocation towards the Covid-19 pandemic. While there has been a wealth of data and scientists working to analyze, diagnose, and predict the spread of the outbreak, there has yet to be a centralized repository for harmonized datasets, immediate analysis needs, and previously completed research outputs. Here we are creating exactly this repository, for a decentralized and open source data science collaboration to combat the COVID-19 pandemic.
Use your skills to analyze data, develop models, and organize results so that policymakers are better informed to make the correct decisions. Another important task is maintenance of existing datasets to keep them clean and accurate. As of March 18th 5:30 PM, the number one most used dataset has 466 issues that need to be addressed. Researchers also are in need of help. Find ways to support here.
JHU CSSE: Daily case reports with time-series data
Tweets dataset: Contains 15 million tweets with date, location, user ID, and a sentiment score
Open research dataset: Contains over 29,000 scholarly articles, including over 13,000 with full text, about COVID-19, SARS-CoV-2, and related coronaviruses for natural language processing projects
Kaggle recovery and exposure dataset: Contains an adapted version of CSSE data with start/end dates for exposure, symptom onset, age, gender, and case summaries
School Closures dataset: Contains a map and table of all U.S. school closures
covidtracking.com U.S. state by state tracking: Contains an API as well as historical time series data
Understanding America Study: Surveys of attitudes and behaviors around the Novel Coronavirus pandemic in the United States
Chest X-Ray Dataset: Contains an image data collection from chest X-Ray and CT images
State by state or regional in focus instead of just national. Maybe even try to predict patient flows to specific hospitals, which is maybe the single most important output for planners
Track if/how stimulus is or isn’t arriving at households e.g. changes in SNAP policy, sick leave, paid parental leave, etc. Track the allocation of dollars using supply side gov’t data. Track the receipt of dollars using household data and potentially social media.
Every policy idea is theoretical until it actually shows up in the surveys. This is essential for accountability and identifying bottlenecks / exclusion / etc.
Where are the most vulnerable students (free reduced lunch, foster, etc.) among schools that have been shut?
Web crawl and social media scrape to have real time dashboard of food/medicine stock outs that could quickly shine a light for suppliers and policymakers
Predict types of people that don’t believe efficacy of social distancing, etc. in order to target public health messaging better. Monitor false information on social media.
ER docs need an anonymous and fast way to share information and findings. Traditional publishing pathways are too slow and twitter is too disorganized and not anonymized