Model

For details on the model, please see SIEM model. For an expanded look at the Homotopic vs. Heterotopic Connectivity Problem, see Bilateral Connectivity Notebook.

Real Data Experiments

In this notebook, we determine whether there exists a batch effect in the difference in connectivity \(\bar{r}_{homo}\) homotopically (same region, opposite hemisphere) vs \(\bar{r}_{hetero}\) heterotopically (different region) connectivity within a particular modality. We consider \(\delta_{x} = \bar{r}_{homo} - \bar{r}_{hetero}\) to be the difference in connectivity for a graph from a graph or collection of graphs from a particular modality \(x\). Here, \(\bar{r}\) is the average rank within a particular region.

Tests

for each Graph

Our test for this notebook is as follows:

\[\begin{align*} H_0: \bar{r}_{homo} \leq \bar{r}_{hetero} \\ H_A: \bar{r}_{homo} > \bar{r}_{hetero} \end{align*}\]

We will use a \(1\)-sample test to determine whether within a single graph we can determine a significant difference in connectivity.

We will perform this experiment for both dMRI and fMRI-derived connectomes.

for Determining a Batch

To determine a batch’s test statistic, we will simply attempt to assess the magnitude of the differences in model fit between each pair of batches. To acquire a p-value, we will use a permutation-based approach. That is:

test_statistic(models, grouping):
  compute the average p_{homo} and p_{hetero} for each unique grouping label.
  For each i, j pair of unique grouping labels:
    stat[i, j] = |p_{homo, 1} - p_{homo, 2} - (p_{hetero, 1} - p_{hetero, 2})|
  return stats  # test statistic is the pairing with the greatest magnitude
  
permutation_test(models, batch_grouping):
  compute the test statistic for the graphs given the default batch grouping.
  For i in 1:nrepetitions:
    permute the batch grouping to obtain a permuted grouping.
    compute the test statistic for the models given the permuted grouping.
  Compute the fraction of permuted test statistics > the given test statistic.

Or the fraction of times that the maximum magnitude of a model fit difference exceeds the observed maximumm magnitude of a model fit difference.

get_results <- function(tstat.out) {
  P <- tstat.out$P
  return(P[upper.tri(P, diag=FALSE)])
}

case.experiment.within <- function(models, datasets.labs, split.labs, modal, nrep=1000) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(split.labs))
  incl.models <- models[include_sets]
  incl.datasets.labs <- datasets.labs[include_sets]
  incl.split.labs <- split.labs[include_sets]
  
  dsets <- unique(incl.datasets.labs)
  results <- data.frame(dataset=c(), pval=c(), modal=c(), size=c())
  
  for (i in 1:length(dsets)) {
    ss <- which(incl.datasets.labs == dsets[i])
    model_ss <- incl.models[ss]  # subset one dataset of models
    split_ss <- incl.split.labs[ss]
    if (length(unique(split_ss)) > 1) {
      perm.result <- gs.siem.batch.perm(model_ss, split_ss, i=1, j=2, nperm=nrep)
      results <- rbind(results, data.frame(dataset=dsets[i], pval=get_results(perm.result), modal=modal, size=length(ss)))
    }
  }
  return(results)
}


case.experiment.between <- function(models, datasets.labs, modal, nrep=1000) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(datasets.labs))
  models <- models[include_sets]
  datasets.labs <- datasets.labs[include_sets]
  
  dsets <- unique(datasets.labs)
  results <- data.frame(dataset=c(), pval=c(), modal=c(), size=c())
  
  perm.result <- gs.siem.batch.perm(models, datasets.labs, i=1, j=2, nperm=nrep)
  results <- rbind(results, data.frame(dataset='set', pval=get_results(perm.result), modal=modal, size=length(ss)))
  return(results)
}

case.experiment.pairwise <- function(models, datasets.labs, modal, nrep=1000) {
  # partition the models that do not have the split data
  include_sets <- which(!is.nan(datasets.labs))
  models <- models[include_sets]
  datasets.labs <- datasets.labs[include_sets]
  
  dsets <- unique(datasets.labs)
  results <- data.frame(dset1=c(), dset2=c(), pval=c(), size=c(), modality=c())
  D <- array(0, dim=c(length(dsets), length(dsets)))
  for (i in 1:(length(dsets) - 1)) {
    ss1 <- datasets.labs %in% dsets[i]
    for (j in ((i+1):length(dsets))) {
      ss <- datasets.labs %in% c(dsets[i], dsets[j])
      perm.result <- gs.siem.batch.perm(models[ss], datasets.labs[ss], i=1, j=2, nperm=nrep)
      results <- rbind(results, data.frame(dset1=dsets[i], dset2=dsets[j], pval=get_results(perm.result),
                                           modal=modal, size=sum(ss1)))
      results <- rbind(results, data.frame(dset1=dsets[j], dset2=dsets[i], pval=get_results(perm.result),
                                           modal=modal, size=sum(ss1)))
      D[i, j] <- get_results(perm.result)
    }
  }
  D <- D + t(D) - diag(diag(D))
  diag(D) <- 1  # no batch between same dataset
  return(list(data=results, D=D, dsets=dsets))
}

g_legend <- function(a.gplot){
  tmp <- ggplot_gtable(ggplot_build(a.gplot))
  leg <- which(sapply(tmp$grobs, function(x) x$name) == "guide-box")
  legend <- tmp$grobs[[leg]]
  return(legend)
}

is.defined = function(x)!is.null(x)

extract_params <- function(models, Z, i=1, j=2, modal=NULL) {
  # aggregate p and qs
  params <- lapply(unique(Z), function(z) {
    ss <- which(Z == z)
    mods <- dwi.models[ss]
    p <- sapply(mods, function(model) model$pr[i])
    q <- sapply(mods, function(model) model$pr[j])
    pm <- mean(p); qm <- mean(q); dm <- pm - qm
    return(data.frame(p=pm, q=qm, d=dm, dataset=z, size=length(ss), modal=modal))
  })
  df <- do.call(rbind, params)
  return(df)
}

Raw Data

The data below can be downloaded and moved to appropriate folders as follows (note that the below section requires sudo access):

sudo mkdir /data/
sudo chmod -R 777 /data

cd /data
wget http://openconnecto.me/mrdata/share/derivatives/dwi_edgelists.tar.gz
wget http://openconnecto.me/mrdata/share/derivatives/fmri_edgelists.tar.gz
wget http://openconnecto.me/mrdata/share/connectome_stats/connectome_stats.zip

mkdir -p /data/connectome_stats /data/all_mr /data/all_mr/dwi/edgelists /data/all_mr/fmri/ranked/edgelists
mv dwi_edgelists.tar.gz /data/dwi/edgelists
cd /data/dwi/edgelists
tar -xvzf dwi_edgelists.tar.gz
mv /data/fmri_edgelists.tar.gz /data/fmri/ranked/edgelists
cd /data/fmri/ranked/edgelists
tar -xvzf fmri_edgelists.tar.gz
mv /data/connectome_stats.zip /data/connectome_stats.zip
cd /data/connectome_stats
unzip connectome_stats.zip

Diffusion

nroi <- 70
dwi.dsets = c('BNU1', 'BNU3', 'HNU1', 'KKI2009', 'NKI1', 'NKIENH', 'MRN1313', 'Templeton114', 'Templeton255', 'SWU4')
dwi.atlas = 'desikan'
dwi.basepath = '/data/all_mr/dwi/edgelists'

graphobj = fmriu.io.collection.open_graphs(basepath = dwi.basepath, atlases = dwi.atlas, datasets = dwi.dsets,
                                           gname = 'graphs', fmt='edgelist', rtype = 'array')

## [1] "opening graphs for BNU1 dataset and desikan parcellation atlas..."
## [1] "opening graphs for BNU3 dataset and desikan parcellation atlas..."
## [1] "opening graphs for HNU1 dataset and desikan parcellation atlas..."
## [1] "opening graphs for KKI2009 dataset and desikan parcellation atlas..."
## [1] "opening graphs for NKI1 dataset and desikan parcellation atlas..."
## [1] "opening graphs for NKIENH dataset and desikan parcellation atlas..."
## [1] "opening graphs for MRN1313 dataset and desikan parcellation atlas..."
## [1] "opening graphs for Templeton114 dataset and desikan parcellation atlas..."
## [1] "opening graphs for Templeton255 dataset and desikan parcellation atlas..."
## [1] "opening graphs for SWU4 dataset and desikan parcellation atlas..."

dwi.graphs = graphobj$graphs
dwi.datasets = graphobj$dataset
dwi.subjects = graphobj$subjects
dwi.sessions = graphobj$sessions

sexpath = '/data/all_mr/phenotypic/'
class = parse_class(sexpath, dwi.dsets, dwi.subjects)

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in file(file, "rt"): cannot open file '/data/all_mr/phenotypic//
## NKIENH_phenotypic_data.csv': No such file or directory

## Warning in file(file, "rt"): cannot open file '/data/all_mr/phenotypic//
## MRN1313_phenotypic_data.csv': No such file or directory

## Warning in is.na(tab$SEX): is.na() applied to non-(list or vector) of type
## 'NULL'

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

dwi.sexs = class$sex

ne = 1225
nroi <- 70
group1 <- c()  # edges in same hemisphere
group2 <- c()  # edges across hemispheres
for (i in 1:nroi) {
  for (j in 1:nroi) {
    idx <- (i - 1)*nroi + j
    if ((j - i) == 35) {
      group1 <- c(group1, idx)
    } else if (j > i) {
      group2 <- c(group2, idx)
    }
  }
}
Es <- list(group1, group2)
for (i in 1:dim(dwi.graphs)[1]) {
  graph <- dwi.graphs[i,,]
  diag(graph) <- NaN
  dwi.graphs[i,,] <- graph
}
dwi.rank.graphs <- gs.xfm.aaply(dwi.graphs, gs.xfm.rank_graph)
dwi.models <- suppressWarnings(gs.xfm.alply(dwi.rank.graphs, gs.siem.fit, Es))
incl <- which(sapply(dwi.models, is.defined))
dwi.datasets <- dwi.datasets[incl]
dwi.sessions <- dwi.sessions[incl]
dwi.sexs <- dwi.sexs[incl]
dwi.subjects <- dwi.subjects[incl]
dwi.models <- dwi.models[incl]

Functional

nroi <- 70
fmri.dsets = c('BNU1', 'BNU2', 'BNU3', 'HNU1', 'IBATRT', 'IPCAS1', 'IPCAS2', 'IPCAS5', 'IPCAS6', 'IPCAS8', 'MRN1', 'NYU1', 'SWU1', 'SWU2', 'SWU3', 'SWU4', 'UWM', 'XHCUMS')
fmri.atlas = 'desikan-2mm'
fmri.basepath = '/data/all_mr/fmri/ranked/edgelists/'

graphobj = fmriu.io.collection.open_graphs(basepath = fmri.basepath, atlases = fmri.atlas, datasets=fmri.dsets, fmt='edgelist', rtype = 'array')

## [1] "opening graphs for BNU1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for BNU2 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for BNU3 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for HNU1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IBATRT dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IPCAS1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IPCAS2 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IPCAS5 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IPCAS6 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for IPCAS8 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for MRN1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for NYU1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for SWU1 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for SWU2 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for SWU3 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for SWU4 dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for UWM dataset and desikan-2mm parcellation atlas..."
## [1] "opening graphs for XHCUMS dataset and desikan-2mm parcellation atlas..."

fmri.graphs = graphobj$graphs
fmri.datasets = graphobj$dataset
fmri.subjects = graphobj$subjects
fmri.sessions <- graphobj$sessions

sexpath = '/data/all_mr/phenotypic/'
class = parse_class(sexpath, fmri.dsets, fmri.subjects)

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

## Warning in doTryCatch(return(expr), name, parentenv, handler): NAs
## introduced by coercion

fmri.sexs = class$sex

ne = 1225
nroi <- 70
group1 <- c()  # edges in same hemisphere
group2 <- c()  # edges across hemispheres
for (i in 1:nroi) {
  for (j in 1:nroi) {
    idx <- (i - 1)*nroi + j
    if ((j - i) == 35) {
      group1 <- c(group1, idx)
    } else if (j > i) {
      group2 <- c(group2, idx)
    }
  }
}
Es <- list(group1, group2)
for (i in 1:dim(fmri.graphs)[1]) {
  graph <- fmri.graphs[i,,]
  diag(graph) <- NaN
  fmri.graphs[i,,] <- graph
}
fmri.rank.graphs <- gs.xfm.aaply(fmri.graphs, gs.xfm.rank_graph)
fmri.models <- suppressWarnings(gs.xfm.alply(fmri.rank.graphs, gs.siem.fit, Es))

incl <- which(sapply(fmri.models, is.defined))
fmri.datasets <- fmri.datasets[incl]
fmri.sessions <- fmri.sessions[incl]
fmri.sexs <- fmri.sexs[incl]
fmri.subjects <- fmri.subjects[incl]
fmri.models <- fmri.models[incl]

Map a color vector:

total.dsets <- union(c('BNU1', 'BNU3', 'HNU1', 'SWU4', 'Templeton114', 'Templeton255',
                       'NKI1', 'NKIENH', 'BNU', 'Templeton'),
                     union(dwi.datasets, fmri.datasets))
total.datasets <- c(dwi.datasets, fmri.datasets)
svals <- c()
for (i in 1:length(total.dsets)) {
  if (total.dsets[i] %in% fmri.datasets) {
    size <- sum(fmri.datasets == total.dsets[i])
  } else {
    size <- sum(dwi.datasets == total.dsets[i])
  }
  svals[i] <- size
}
cols <- c("#5362a6","#97b438","#954dc0","#55b74d","#da71dc","#c2ab3a","#5c6ada","#d99a37",
          "#b483da","#4d7634","#c93697","#59b88c","#da4277","#45aecf","#d3462c","#8396de",
          "#e57a33","#964d8a","#9f9f57","#e36dab","#91682b","#d78bb8","#af542e","#a04659",
          "#db996b","#d03d4c")
names(svals) <- total.dsets
names(cols) <- total.dsets
saveRDS(list(svals=svals, cols=cols), 'colors.rds')

Pre-Experimental: Visualizing P and Q across Studies

dwi.params <- extract_params(dwi.models, dwi.datasets, i=1, j=2, modal='dMRI')
fmri.params <- extract_params(fmri.models, fmri.datasets, i=1, j=2, modal='fMRI')
params <- rbind(dwi.params, fmri.params)
colnames(params)[3] <- "p - q"
saveRDS(params, 'results_bil_params.rds')
# params <- melt(params, id=c("dataset", "modal", "size"))
# plot_pre <- ggplot(params, aes(x=modal, y=value, shape=variable, size=size, color=dataset)) +
#   geom_jitter() +
#   scale_shape_discrete(breaks = c("p", "q", "p - q")) +
#   xlab("Modality") +
#   ylab("Value") +
#   theme_bw() +
#   guides(size=FALSE) +
#   ggtitle("Exploratory Analysis of Within-Dataset Rank") +
#   labs(shape="Variable", color="Dataset")
plot_pre <- ggplot(params, aes(x=p, y=q, shape=modal, size=size, color=dataset)) +
  geom_point(alpha=1) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  xlab(TeX("$\\hat{p}$")) +
  ylab(TeX("$\\hat{q}$")) +
  theme_bw() +
  ggtitle("Exploratory Analysis") +
  labs(shape="Modality", color="Dataset")

Case 1: Session Partitioning

Is there a batch effect between sessions of a particular study?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 sessions.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

results1 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.sessions, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.datasets, fmri.sessions, 'fMRI', nrep=1000))
saveRDS(results1, 'results_bil_case1.rds')

results1 <- readRDS('results_bil_case1.rds')
plot1 <- ggplot(results1, aes(x=modal, y=pval, shape=modal, color=dataset, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 1: Sessions") +
  ylab(TeX("$p$-Value")) +
  xlab("Modality") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  labs(shape="Modality", color="Dataset") +
  theme_bw()

Case 2: Subject Partitioning

Is there a batch effect between subjects in a particular study?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

results2 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.subjects, 'dMRI', nrep=100),
      case.experiment.within(fmri.models, fmri.datasets, fmri.subjects, 'fMRI', nrep=100))
saveRDS(results2, 'results_bil_case2.rds')

results2 <- readRDS('results_bil_case2.rds')
dwi.plot2 <- ggplot(results2[results2$modal == 'dMRI',], aes(pval, color=dataset)) +
  geom_density(alpha=0.2, size=1.5) +
  scale_color_manual(values=cols) +
  xlab("") +
  coord_flip() +
  ylab("") +
  theme_bw() +
  scale_x_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1))
  
fmri.plot2 <- ggplot(results2[results2$modal == 'fMRI',], aes(pval, color=dataset)) +
  geom_density(alpha=0.2, size=1.5) +
  scale_color_manual(values=cols) +
  xlab("") +
  coord_flip() +
  ylab("") +
  theme_bw() +
  scale_x_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1))

plot2 <- grid.arrange(dwi.plot2 + theme(legend.position=NaN) + ylab("dMRI"),
                      fmri.plot2 + theme(legend.position=NaN) + ylab("fMRI"),
                      nrow=1, top = textGrob("Case 2: Subjects",gp=gpar(fontsize=14)))

## Warning: Removed 230700 rows containing non-finite values (stat_density).

## Warning: Removed 1509 rows containing non-finite values (stat_density).

Case 3: Same Sex

Is there a batch effect between sexs in a particular study?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

results3 <- rbind(case.experiment.within(dwi.models, dwi.datasets, dwi.sexs, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.datasets, fmri.sexs, 'fMRI', nrep=1000))
saveRDS(results3, 'results_bil_case3.rds')

results3 <- readRDS('results_bil_case3.rds')
plot3 <- ggplot(results3, aes(x=modal, y=pval, shape=modal, group=dataset, color=dataset, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 3: Same Sex") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Case 4: Same Site

Is there a batch effect between studies with site held fixed?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

dwi.studies <- dwi.datasets
fmri.studies <- fmri.datasets
dwi.studies[dwi.studies == 'BNU1' | dwi.studies == 'BNU3'] <- 'BNU'
fmri.studies[fmri.studies == 'BNU1' | fmri.studies == 'BNU3'] <- 'BNU'
dwi.studies[dwi.studies == 'Templeton114' | dwi.studies == 'Templeton255'] <- 'Templeton'

results4 <- rbind(case.experiment.within(dwi.models, dwi.studies, dwi.datasets, 'dMRI', nrep=1000),
      case.experiment.within(fmri.models, fmri.studies, fmri.datasets, 'fMRI', nrep=1000))
saveRDS(results4, 'results_bil_case4.rds')

results4 <- readRDS('results_bil_case4.rds')
plot4 <- ggplot(results4, aes(x=modal, y=pval, shape=modal, color=dataset, group=dataset)) +
  geom_jitter(alpha=1, size=4, width=0.5, height=0) +
  ggtitle("Case 4: Same Site") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  labs(color="Site") +
  guides(shape=FALSE) +
  theme_bw()

Case 5: Same Demographics

Is there a batch effect between studies with demographic held fixed?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

dwi.dset.sset <- which(dwi.datasets %in% c('BNU1', 'SWU4', 'HNU1', 'BNU3'))
dwi.models.subs  <- dwi.models[dwi.dset.sset]; dwi.datasets.subs <- dwi.datasets[dwi.dset.sset]

fmri.dset.sset <- which(fmri.datasets %in% c('BNU1', 'SWU4', 'HNU1', 'BNU3'))
fmri.models.subs  <- fmri.models[fmri.dset.sset]; fmri.datasets.subs <- fmri.datasets[fmri.dset.sset]

results5 <- rbind(case.experiment.pairwise(dwi.models.subs, dwi.datasets.subs, 'dMRI', nrep=1000)$data,
      case.experiment.pairwise(fmri.models.subs, fmri.datasets.subs, 'fMRI', nrep=1000)$data)
saveRDS(results5, 'results_bil_case5.rds')

results5 <- readRDS('results_bil_case5.rds')
plot5 <- ggplot(results5, aes(x=modal, y=pval, shape=modal, group=dset1, color=dset1, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 5: Same Demographics") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Case 6: Disparate Demographics

Is there a batch effect between studies without demographic held fixed?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

results6 <- rbind(case.experiment.pairwise(dwi.models, dwi.datasets, 'dMRI', nrep=1000)$data,
      case.experiment.pairwise(fmri.models, fmri.datasets, 'fMRI', nrep=1000)$data)
saveRDS(results6, 'results_bil_case6.rds')

results6 <- readRDS('results_bil_case6.rds')
plot6 <-  ggplot(results6, aes(x=modal, y=pval, shape=modal, group=dset1, color=dset1, size=size)) +
  geom_jitter(alpha=0.8, width=0.25, height=0) +
  scale_color_manual(name="Dataset", values=cols) +
  scale_size_continuous(name="Dataset", breaks=svals) +
  ggtitle("Case 6: Disparate Demographics") +
  xlab("") +
  ylab("") +
  scale_y_continuous(trans=log10_trans(), limits=c(.001, 1), breaks=c(.001, .001, .01, .1, 1)) +
  theme_bw()

Pairwise Study Comparison

Is there a batch effect between all pairs of studies?

Procedure

Compute test statistic given the default partitioning, and obtain \(\tau_{observed}\), as the maximum magnitude of difference in ipsilateral vs. contralateral connectivity between any pair of 2 subjects.
permute the set labels of the combined set \(nperm\) times (maintaining sex of each graph) to obtain the distribution of \(\tau_{null}\), reporting \(\hat{p}\), the estimator of \(\mathbb{E}\left[\tau_{null} < \tau_{observed}\right]\)

dwi_pair_results <- case.experiment.pairwise(dwi.models, dwi.datasets, 'dMRI', nrep=1000)
fmri_pair_results <- case.experiment.pairwise(fmri.models, fmri.datasets, 'fMRI', nrep=1000)
results_pair <- list(dwi_pair_results, fmri_pair_results)
saveRDS(results_pair, 'results_bil_pair.rds')

Results

Cases

results_pair <- readRDS('results_bil_pair.rds')
dwi_pair_results <- results_pair[[1]]
fmri_pair_results <- results_pair[[2]]
top_leg <- g_legend(plot_pre)
bot_leg <- g_legend(plot4)
bottom <- grid.arrange(arrangeGrob(plot1 + theme(legend.position=NaN),
                         plot2,
                         plot3 + theme(legend.position=NaN),
                         plot4 + theme(legend.position=NaN),
                         plot5 + theme(legend.position=NaN),
                         plot6 + theme(legend.position=NaN), nrow=2),
             arrangeGrob(top_leg, bot_leg, nrow=2), nrow=1, widths=c(0.82, 0.16))

## Warning: Removed 2 rows containing missing values (geom_point).

Pairwise Comparisons

dMRI

dwi_pair_results$data$dset1 <- ordered(dwi_pair_results$data$dset1, levels=names(svals))
dwi_pair_results$data$dset2 <- ordered(dwi_pair_results$data$dset2, levels=names(svals))
dwi.pair <- ggplot(dwi_pair_results$data, aes(x=dset1, y=dset2, fill=pval)) +
        geom_tile() +
        ggtitle("dMRI Paired Batch") +
        xlab("Dataset") +
        ylab("Dataset") +
        scale_fill_gradientn(name=TeX("$p$-value"), trans="log", breaks=c(0.001, 0.01, 0.1, 1),
                             colours=c("#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"), limits=c(0.001, 1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

fMRI

fmri_pair_results$data$dset1 <- ordered(fmri_pair_results$data$dset1, levels=names(svals))
fmri_pair_results$data$dset2 <- ordered(fmri_pair_results$data$dset2, levels=names(svals))
fmri.pair <- ggplot(fmri_pair_results$data, aes(x=dset1, y=dset2, fill=pval)) +
        geom_tile() +
        ggtitle("fMRI Paired Batch") +
        xlab("Dataset") +
        ylab("Dataset") +
        scale_fill_gradientn(name=TeX("$p$-value"), trans="log", breaks=c(0.001, 0.01, 0.1, 1),
                             colours=c("#f2f0f7", "#cbc9e2", "#9e9ac8", "#6a51a3"), limits=c(0.001, 1)) +
        theme(axis.text.x = element_text(angle = 90, hjust = 1))

Combined plot

pair_leg <- g_legend(dwi.pair)
top <- grid.arrange(plot_pre + theme(legend.position=NaN),
                     grid.arrange(dwi.pair + theme(legend.position=NaN),
                                  fmri.pair + theme(legend.position=NaN),
                                  pair_leg, widths=c(0.4, 0.4, 0.2)),
                     widths=c(0.2, 0.7), nrow=1)

grid.arrange(top, bottom, heights=c(0.4, 0.6))

SIEM for Determination of Batch Effects in fMRI and dMRI Populations wrt Bilateral Connectivity

Eric Bridgeford

December 31, 2017

Model

Real Data Experiments

Tests

for each Graph

for Determining a Batch

Raw Data

Diffusion

Functional

Pre-Experimental: Visualizing P and Q across Studies

Case 1: Session Partitioning

Procedure

Case 2: Subject Partitioning

Procedure

Case 3: Same Sex

Procedure

Case 4: Same Site

Procedure

Case 5: Same Demographics

Procedure

Case 6: Disparate Demographics

Procedure

Pairwise Study Comparison

Procedure

Results

Cases

Pairwise Comparisons

dMRI

fMRI

Combined plot