Currently, scientific research groups produce data in a variety of data formats and different conventions, making it difficult to share information across labs and reproduce results.
We propose a foundational, flexible data standard for large-scale neuroscience data called RAMON (Reusable Annotation Markup for Open Neuroscience). This serves as a key enabling technology for communication and data democratization.
For ingesting data, we provide a simple csv format, organized as kv pairs by RAMONType. For each RAMON Object, a number of standard fields are available to aid in standardization and efficient queries. However, from an ingest perspective,
We ingested the data beginning with full xy png stacks exported from VAST. Neurodata offers several methods to ingest data, including an auto-ingest process. This process is described in detail elsewhere.
The VAST metadata was combined with information in the Cell supplemental information spreadsheet to create a metadata about each object. When feasible, object statistics were recomputed from the raw data. We choose to do this process using MATLAB because the VAST scripts and data are most amenable to parsing using existing tools.
From the spreadsheet, several important pieces of metadata were gleaned:
further development will compute these from data
All of this information is stored in NeuroData databases - we choose to store the data in four channels within a single unifying project (neurons, synapses, mitochondria, vesicles). The original raw data are also available for provenance and to allow others to parse this data in other ways.
Once all of the data is parsed, we are able to do reproducible, scalable scientific discovery! To retrieve the parsed data from the databases, we offer solutions in MATLAB, Python, and via RESTful endpoint. To illustrate the diversity of queries and to provide reproducible results from the Kasthuri paper, we have created ipython notebooks for each claim. We use ndio to facilitate getting data.
A full list of claims will be provided with a more detailed explaination. For now, the claims may be viewed here, and are under active development.
NeuroData now offers the capabilities to crete RAMON objects during the process of manual or automatic discovery and annotation. Please contact us for details on how to organize your data so that your claims are reproducible and available to the community as part of your publication process.
The data views that are in the paper may be viewed at these links: http://viz.neurodata.io/kasthuri2015_ramon_v4/ http://viz.neurodata.io/dataview/mouseS1