In order to share data between our container and the rest of our machine in Docker, we need to mount a volume.
Docker does this with the -v flag. Docker expects its input formatted as: -v path/to/local/data:/path/in/container
We’ll do this when we launch our container, as well as give it a helpful name so we can locate it later on.
The neurodata/m2g
Docker container enables users to run end-to-end connectome estimation on structural MRI or functional MRI right from container launch. The pipeline requires that data be organized in accordance with the BIDS spec. If the data you wish to process is available on S3 you simply need to provide your s3 credentials at build time and the pipeline will auto-retrieve your data for processing.
If you have never used Docker before, it is useful to run through the Docker documentation.
Getting Docker container:
$ docker pull neurodata/m2g
Structural Connectome Pipeline (m2g-d)¶
The structural connectome pipeline can be ran with:
$ m2g --pipeline dwi <input_directory> <output_directory>
We recommend specifying an atlas and lowering the default seed density on test runs (although, for real runs, we recommend using the default seeding – lowering seeding simply decreases runtime):
$ m2g --pipeline dwi --seeds 1 --parcellation Desikan <input_directory> <output_directory>
You can set a particular scan and session as well (recommended for batch scripts):
$ m2g --pipeline dwi --seeds 1 --parcellation Desikan --participant_label <label> --session_label <label> <input_directory> <output_directory>
The outputs of the pipeline are organized as described here.
Functional Connectome Pipeline (m2g-f)¶
The functional connectome pipeline can be ran with:
$ m2g --pipeline func <input_directory> <output_directory>
$ m2g --pipeline func --parcellation Desikan <input_directory> <output_directory>
You can set a particular scan and session as well (recommended for batch scripts):
$ m2g --pipeline func --parcellation Desikan --participant_label <label> --session_label <label> <input_directory> <output_directory>
The outputs of the pipeline are organized as described here.
Running both m2g-d and m2g-f¶
Both pipelines can be run by setting the pipeline parameter to both:
$ m2g --pipeline both <input_directory> <output_directory>
To install and run a tutorial of the latest Docker image of m2g, pull the docker image from DockerHub using the following command. Then enter it using docker run:
$ docker pull neurodata/m2g:latest
$ docker run -ti --entrypoint /bin/bash neurodata/m2g:latest
Once inside of the Docker container, download a tutorial dataset of fMRI and diffusion MRI data from the open-neurodata <>`_ AWS S3 bucket to the /input directory in your container (make sure you are connected to the internet):
$ aws s3 sync --no-sign-request s3://open-neurodata/m2g/TUTORIAL /input
Now you can run the m2g pipeline for both the functional and diffusion MRI data using the command below. The number of seeds is intentionally set lower than recommended, along with a larger than recommended voxelsize for a faster run time (approximately 25 minutes). For more information as to what these input arguments represent, see the Tutorial section below.:
$ m2g --participant_label 0025864 --session_label 1 --parcellation AAL_ --pipeline both --seeds 1 --voxelsize 4mm /input /output
Once the pipeline is done running, the resulting outputs can be found in /output/sub-0025864/ses-1/, see Outputs section below for a description of each file.
Working with S3 Datasets¶
m2g has the ability to work on datasets stored on Amazon’s Simple Storage Service, assuming they are in BIDS format. Doing so requires you to set your AWS credentials and read the related s3 bucket documentation. You can find a guide here.
Example Datasets¶
Derivatives have been produced on a variety of datasets, all of which are made available on our website. Each of these datsets is available for access and download from their respective sources. Alternatively, example datasets on the BIDS website which contain diffusion data can be used and have been tested; ds114, for example.
Command-Line Arguments¶
Below is the help output generated by running m2g with the -h
command. All parameters are explained in this output.
$ docker run -ti neurodata/m2g -h
usage: m2g [-h]
[--session_label SESSION_LABEL [SESSION_LABEL ...]]
[--pipeline PIPELINE] [--acquisition ACQUISITION] [--tr TR]
[--push_location PUSH_LOCATION]
[--parcellation PARCELLATION [PARCELLATION ...]] [--skipeddy]
[--skipreg] [--voxelsize VOXELSIZE] [--mod MOD]
[--track_type TRACK_TYPE] [--diffusion_model DIFFUSION_MODEL]
[--space SPACE] [--seeds SEEDS] [--skull SKULL] [--mem_gb MEM_GB]
[--n_cpus N_CPUS]
input_dir output_dir
This is an end-to-end connectome estimation pipeline from fMRI and diffusion
weighted MRI data.
positional arguments:
input_dir The directory with the input dataset formatted
according to the BIDS standard. To use data from s3,
put the bucket and directory location as the input
directory: `s3://<bucket>/<dataset>` downloaded file
will be stored in ~/.m2g/input. If directory already
exists it will be deleted.
output_dir The local directory where the output files should be
optional arguments:
-h, --help show this help message and exit
The label(s) of the participant(s) that should be
analyzed. The label corresponds to
sub-<participant_label> from the BIDS spec (so it does
not include "sub-"). If this parameter is not provided
all subjects should be analyzed. Multiple participants
can be specified with a space separated list.
--session_label SESSION_LABEL [SESSION_LABEL ...]
The label(s) of the session that should be analyzed.
The label corresponds to ses-<participant_label> from
the BIDS spec (so it does not include "ses-"). If this
parameter is not provided all sessions should be
analyzed. Multiple sessions can be specified with a
space separated list.
--pipeline PIPELINE Pipline to use when analyzing the input data, either
func, dwi, or both. Default is dwi.
--acquisition ACQUISITION
Acquisition method for functional data: altplus -
Alternating in the +z direction alt+z - Alternating in
the +z direction alt+z2 - Alternating, but beginning
at slice #1 altminus - Alternating in the -z direction
alt-z - Alternating in the -z direction alt-z2 -
Alternating, starting at slice #nz-2 instead of #nz-1
seqplus - Sequential in the plus direction seqminus -
Sequential in the minus direction, default is alt+z.
For more information:https://fcp-
--tr TR functional scan TR (seconds), default is 2.0
--push_location PUSH_LOCATION
Name of folder on s3 to push output data to, if the
folder does not exist, it will be created.Format the
location as `s3://<bucket>/<path>`
The parcellation(s) being analyzed. Multiple
parcellations can be provided with a space separated
list. If not parcellations are defined, will use all
parcellations from neuroparc.
--skipeddy Whether to skip eddy correction if it has already been
run and the files can be found in output_dir.
--skipreg Shether to skip registration of the parcellations if
it has already been run and the files can be fround in
--voxelsize VOXELSIZE
Voxel size : 2mm, 1mm. Voxel size of both parcellation
and reference structural image to use for template
--mod MOD Deterministic (det) or probabilistic (prob) tracking
method for the dwi tractography. Default is det.
--track_type TRACK_TYPE
Tracking approach: local, particle. Default is local.
--diffusion_model DIFFUSION_MODEL
Diffusion model: csd or csa. Default is csa.
--space SPACE Space for tractography: native, native_dsn. Default is
--seeds SEEDS Seeding density for tractography in the m2g-d
pipeline. Default is 20.
--skull SKULL Special actions to take when skullstripping t1w image
based on default skullstrip ('none') failure: Excess
tissue below brain: below Chunks of cerebelum missing:
cerebelum Frontal clipping near eyes: eye Excess
clipping in general: general,
--mem_gb MEM_GB Memory, in GB, to allocate to functional pipeline
--n_cpus N_CPUS Number of cpus to allocate to either the functional
pipeline or the diffusion connectome generation