Data Preprocessing¶
The outputs from the m2g
pipeline is available in our open-access AWS S3 bucket: s3://open-neurodata/m2
. You can use the file tree to browse the outputs http://open-neurodata.s3-website-us-east-1.amazonaws.com/.
[1]:
import boto3
from botocore import UNSIGNED
from botocore.client import Config
from pathlib import Path
import numpy as np
from graspologic.utils import import_edgelist, pass_to_ranks
/Users/j1c/miniconda3/envs/m2g/lib/python3.10/site-packages/tqdm/auto.py:21: TqdmWarning: IProgress not found. Please update jupyter and ipywidgets. See https://ipywidgets.readthedocs.io/en/stable/user_install.html
from .autonotebook import tqdm as notebook_tqdm
[ ]:
modalities = ["Diffusion", "Functional"]
diffusion_datasets = [
"SWU4",
"HNU1",
"NKIENH",
"XHCUMS",
"BNU1",
"BNU3",
"NKI1",
"NKI24",
"IPCAS8",
"MRN_1",
]
functional_datasets = [
"NYU_2",
"SWU4",
"HNU1",
"XHCUMS",
"UPSM_1",
"BNU3",
"IPCAS7",
"SWU1",
"IPCAS1",
"BNU1",
]
datasets = {"Diffusion": diffusion_datasets, "Functional": functional_datasets}
Fetch from S3 and Download to Local¶
The files will be stored at m2g/docs/paper/data/
directory.
[5]:
parcellation = "DKT_space-MNI152NLin6_res-2x2x2"
bucket = "open-neurodata"
for modality in modalities:
if modality == "Diffusion":
parcellation = "DKT_space-MNI152NLin6_res-2x2x2"
else:
parcellation = "DKT_space-MNI152NLin6_res-2x2x2.nii.gz"
prefix = f"m2g/{modality}/"
s3 = boto3.client("s3", config=Config(signature_version=UNSIGNED))
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter="/")
dataset_fullnames = []
for dset in datasets[modality]:
for r in resp.get("CommonPrefixes"):
if dset in r.get("Prefix"):
dataset_fullnames.append(r.get("Prefix"))
for dset, dset_abbrev in zip(dataset_fullnames, datasets[modality]):
prefix = f"{dset}Connectomes/{parcellation}/"
resp = s3.list_objects_v2(Bucket=bucket, Prefix=prefix, Delimiter="/")
contents = resp["Contents"]
files = []
for obj in contents:
key = obj["Key"]
if modality == "Functional":
if key.endswith(".csv") and "abs" in key:
files.append(key)
else:
if key.endswith(".csv"):
files.append(key)
print(f"Downloading {dset}... Total files: {len(files)}")
# Save to data folder
p = Path(f"./data/{modality}/{dset_abbrev}")
p.mkdir(parents=True, exist_ok=True)
# Download files
for f in files:
out = p / Path(f).name
if not out.exists():
s3.download_file(bucket, f, out)
Downloading m2g/Diffusion/SWU4-8-27-20-m2g-native-csa-det/... Total files: 422
Downloading m2g/Diffusion/HNU1-8-27-20-m2g-native-csa-det/... Total files: 300
Downloading m2g/Diffusion/NKIENH-11-01-20-m2g-native-csa-det/... Total files: 129
Downloading m2g/Diffusion/XHCUMS-8-27-20-m2g-native-csa-det/... Total files: 117
Downloading m2g/Diffusion/BNU1-8-27-20-m2g-native-csa-det/... Total files: 114
Downloading m2g/Diffusion/BNU3-11-01-20-m2g-native-csa-det/... Total files: 47
Downloading m2g/Diffusion/NKI1-8-24-20-m2g-native-csa-det/... Total files: 40
Downloading m2g/Diffusion/NKI24-11-01-20-m2g-native-csa-det/... Total files: 38
Downloading m2g/Diffusion/IPCAS8-8-27-20-m2g-native-csa-det/... Total files: 26
Downloading m2g/Diffusion/MRN_1-8-27-20-m2g-native-csa-det/... Total files: 19
Downloading m2g/Functional/NYU_2-11-27-20-m2g-func/... Total files: 494
Downloading m2g/Functional/SWU4-11-12-20-m2g-func/... Total files: 425
Downloading m2g/Functional/HNU1-11-12-20-m2g-func/... Total files: 300
Downloading m2g/Functional/XHCUMS-11-27-20-m2g-func/... Total files: 247
Downloading m2g/Functional/UPSM_1-11-27-20-m2g-func/... Total files: 230
Downloading m2g/Functional/BNU3-11-12-20-m2g-func/... Total files: 144
Downloading m2g/Functional/IPCAS7-11-27-20-m2g-func/... Total files: 144
Downloading m2g/Functional/SWU1-11-27-20-m2g-func/... Total files: 119
Downloading m2g/Functional/IPCAS1-11-27-20-m2g-func/... Total files: 118
Downloading m2g/Functional/BNU1-11-12-20-m2g-func/... Total files: 106
Compute mean connectomes¶
This data will be used for plotting in Figure 2.
[10]:
out_dir = Path(f"./data/mean_connectomes/")
out_dir.mkdir(parents=True, exist_ok=True)
for modality, dsets in datasets.items():
if modality == "Functional":
keyword = "*abs*"
else:
keyword = "*"
for dset in dsets:
p = Path(f"./data/{modality}/{dset}")
files = list(p.glob(keyword))
print(
f"Computing mean graph for {modality} {dset}... Total files: {len(files)}"
)
graphs = import_edgelist(files, "csv")
graphs = [pass_to_ranks(g) for g in graphs]
# Compute mean graph
mean_graph = np.array(graphs).mean(axis=0)
# Save mean graph
np.save(out_dir / f"{len(files):>03}_{modality}_{dset}", mean_graph)
Computing mean graph for Diffusion SWU4... Total files: 422
Computing mean graph for Diffusion HNU1... Total files: 300
Computing mean graph for Diffusion NKIENH... Total files: 129
Computing mean graph for Diffusion XHCUMS... Total files: 117
Computing mean graph for Diffusion BNU1... Total files: 114
Computing mean graph for Diffusion BNU3... Total files: 47
Computing mean graph for Diffusion NKI1... Total files: 40
Computing mean graph for Diffusion NKI24... Total files: 38
Computing mean graph for Diffusion IPCAS8... Total files: 26
Computing mean graph for Diffusion MRN_1... Total files: 19
Computing mean graph for Functional NYU_2... Total files: 494
Computing mean graph for Functional SWU4... Total files: 425
Computing mean graph for Functional HNU1... Total files: 300
Computing mean graph for Functional XHCUMS... Total files: 247
Computing mean graph for Functional UPSM_1... Total files: 230
Computing mean graph for Functional BNU3... Total files: 144
Computing mean graph for Functional IPCAS7... Total files: 144
Computing mean graph for Functional SWU1... Total files: 119
Computing mean graph for Functional IPCAS1... Total files: 118
Computing mean graph for Functional BNU1... Total files: 106
[ ]: