NeuroData Paper 2016


Abstract

Recent technical progress allows neuroexperimentalists to collect ever more detailed and informative anatomical and physiological data from brains of all sizes. These datasets span experimentally accessible spatiotemporal scales, ranging from nanometer to meter, and millisecond to monthly sampling rates. In classical neuroscientific experimental paradigms, it was feasible for neuroscientists to draw their results on paper. In contrast, many modern neuroscientific experimental paradigms break the classic data analysis workflow. In particular, these large datasets create significant challenges for our community at every step of the data analysis pipeline: (1) storing, (2) exploring, (3) parsing, and (4) analyzing. NeuroData has been developed to lower the barrier to entry into big data neuroscience. We have designed and built a comprehensive ecosystem towards enabling petascale neuroscience. This includes our flagship project, the NeuroData Connectome Project (previously called the Open Connectome Project). Our infrastructure enables anyone in the world with internet access to visualize, download, analyze, upload, and interact with a large number of public datasets. Moreover, all the results obtained using the NeuroData infrastructure are fundamentally reproducible and extensible. We demonstrate the utility of these tools via two serial electron microscopy case studies. First, we have reproduced many of the quantitative results from a recent landmark EM paper that included a saturated annotation of a region of cortex. Second, we have tested a novel hypothesis about the distribution of synapse locations in cortex on a larger, complementary dataset. Via the NeuroData infrastructure, the answers given by these tools are both fully reproducible, and extensible for analysis on other datasets that hold knowledge about these questions. NeuroData democratizes the scientific process, enabling anyone, regardless of their background, computational resources, and expertise to study neuroscience at scale. We are in the process of scaling up the number of datasets, the range of experimental modalities, and the Web-services we enable. Work underway will provide pre-packaged cluster environments -- easily deployable on local or commercial cloud computing infrastructures -- so that others can replicate and modify our services internally. All of our code and data are available online at ndpaper.neurodata.io, in accordance with open science standards.