Data Preprocessing¶

The outputs from the m2g pipeline is available in our open-access AWS S3 bucket: s3://open-neurodata/m2. You can use the file tree to browse the outputs http://open-neurodata.s3-website-us-east-1.amazonaws.com/.

[1]:

import boto3
from botocore import UNSIGNED
from botocore.client import Config

from pathlib import Path
import numpy as np

from graspologic.utils import import_edgelist, pass_to_ranks

/Users/j1c/miniconda3/envs/m2g/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
  from .autonotebook import tqdm as notebook_tqdm

[ ]:

modalities = ["Diffusion", "Functional"]
diffusion_datasets = [
    "SWU4",
    "HNU1",
    "NKIENH",
    "XHCUMS",
    "BNU1",
    "BNU3",
    "NKI1",
    "NKI24",
    "IPCAS8",
    "MRN_1",
]
functional_datasets = [
    "NYU_2",
    "SWU4",
    "HNU1",
    "XHCUMS",
    "UPSM_1",
    "BNU3",
    "IPCAS7",
    "SWU1",
    "IPCAS1",
    "BNU1",
]

datasets = {"Diffusion": diffusion_datasets, "Functional": functional_datasets}

Fetch from S3 and Download to Local¶

The files will be stored at m2g/docs/paper/data/ directory.

[5]:

parcellation = "DKT_space-MNI152NLin6_res-2x2x2"
bucket = "open-neurodata"

for modality in modalities:
    if modality == "Diffusion":
        parcellation = "DKT_space-MNI152NLin6_res-2x2x2"
    else:
        parcellation = "DKT_space-MNI152NLin6_res-2x2x2.nii.gz"

    prefix = f"m2g/{modality}/"

    s3 = boto3.client("s3", config=Config(signature_version=UNSIGNED))
    resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter="/")

    dataset_fullnames = []
    for dset in datasets[modality]:
        for r in resp.get("CommonPrefixes"):
            if dset in r.get("Prefix"):
                dataset_fullnames.append(r.get("Prefix"))

    for dset, dset_abbrev in zip(dataset_fullnames, datasets[modality]):
        prefix = f"{dset}Connectomes/{parcellation}/"

        resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter="/")
        contents = resp["Contents"]

        files = []
        for obj in contents:
            key = obj["Key"]
            if modality == "Functional":
                if key.endswith(".csv") and "abs" in key:
                    files.append(key)
            else:
                if key.endswith(".csv"):
                    files.append(key)

        print(f"Downloading {dset}... Total files: {len(files)}")

        # Save to data folder
        p = Path(f"./data/{modality}/{dset_abbrev}")
        p.mkdir(parents=True, exist_ok=True)

        # Download files
        for f in files:
            out = p / Path(f).name
            if not out.exists():
                s3.download_file(bucket, f, out)

Downloading m2g/Diffusion/SWU4-8-27-20-m2g-native-csa-det/... Total files: 422
Downloading m2g/Diffusion/HNU1-8-27-20-m2g-native-csa-det/... Total files: 300
Downloading m2g/Diffusion/NKIENH-11-01-20-m2g-native-csa-det/... Total files: 129
Downloading m2g/Diffusion/XHCUMS-8-27-20-m2g-native-csa-det/... Total files: 117
Downloading m2g/Diffusion/BNU1-8-27-20-m2g-native-csa-det/... Total files: 114
Downloading m2g/Diffusion/BNU3-11-01-20-m2g-native-csa-det/... Total files: 47
Downloading m2g/Diffusion/NKI1-8-24-20-m2g-native-csa-det/... Total files: 40
Downloading m2g/Diffusion/NKI24-11-01-20-m2g-native-csa-det/... Total files: 38
Downloading m2g/Diffusion/IPCAS8-8-27-20-m2g-native-csa-det/... Total files: 26
Downloading m2g/Diffusion/MRN_1-8-27-20-m2g-native-csa-det/... Total files: 19
Downloading m2g/Functional/NYU_2-11-27-20-m2g-func/... Total files: 494
Downloading m2g/Functional/SWU4-11-12-20-m2g-func/... Total files: 425
Downloading m2g/Functional/HNU1-11-12-20-m2g-func/... Total files: 300
Downloading m2g/Functional/XHCUMS-11-27-20-m2g-func/... Total files: 247
Downloading m2g/Functional/UPSM_1-11-27-20-m2g-func/... Total files: 230
Downloading m2g/Functional/BNU3-11-12-20-m2g-func/... Total files: 144
Downloading m2g/Functional/IPCAS7-11-27-20-m2g-func/... Total files: 144
Downloading m2g/Functional/SWU1-11-27-20-m2g-func/... Total files: 119
Downloading m2g/Functional/IPCAS1-11-27-20-m2g-func/... Total files: 118
Downloading m2g/Functional/BNU1-11-12-20-m2g-func/... Total files: 106

Compute mean connectomes¶

This data will be used for plotting in Figure 2.

[10]:

out_dir = Path(f"./data/mean_connectomes/")
out_dir.mkdir(parents=True, exist_ok=True)

for modality, dsets in datasets.items():
    if modality == "Functional":
        keyword = "*abs*"
    else:
        keyword = "*"

    for dset in dsets:
        p = Path(f"./data/{modality}/{dset}")
        files = list(p.glob(keyword))

        print(
            f"Computing mean graph for {modality} {dset}... Total files: {len(files)}"
        )

        graphs = import_edgelist(files, "csv")
        graphs = [pass_to_ranks(g) for g in graphs]

        # Compute mean graph
        mean_graph = np.array(graphs).mean(axis=0)

        # Save mean graph
        np.save(out_dir / f"{len(files):>03}_{modality}_{dset}", mean_graph)

Computing mean graph for Diffusion SWU4... Total files: 422
Computing mean graph for Diffusion HNU1... Total files: 300
Computing mean graph for Diffusion NKIENH... Total files: 129
Computing mean graph for Diffusion XHCUMS... Total files: 117
Computing mean graph for Diffusion BNU1... Total files: 114
Computing mean graph for Diffusion BNU3... Total files: 47
Computing mean graph for Diffusion NKI1... Total files: 40
Computing mean graph for Diffusion NKI24... Total files: 38
Computing mean graph for Diffusion IPCAS8... Total files: 26
Computing mean graph for Diffusion MRN_1... Total files: 19
Computing mean graph for Functional NYU_2... Total files: 494
Computing mean graph for Functional SWU4... Total files: 425
Computing mean graph for Functional HNU1... Total files: 300
Computing mean graph for Functional XHCUMS... Total files: 247
Computing mean graph for Functional UPSM_1... Total files: 230
Computing mean graph for Functional BNU3... Total files: 144
Computing mean graph for Functional IPCAS7... Total files: 144
Computing mean graph for Functional SWU1... Total files: 119
Computing mean graph for Functional IPCAS1... Total files: 118
Computing mean graph for Functional BNU1... Total files: 106

[ ]: