Comparing a-Priori SBM Fitting with Different Groupings

It is often desirable to create models of data so we can generate new samples, learn something about the data’s structure, or for use in some larger analysis. SBM models define different connection probabilities for different communities in a graph.

Here we enforce a community structure a-priori based a some connectome attributes or combinations thereof, and fit an SBM. In this case, the connection probability between two blocks is simply the total number of observed connections between them, divided by the total number of possible connections between them.

We fit these SBMs using different attributes or combinations of attributes from the connectome, and then compute the BIC (Bayesian Information Criterion) for each model. This gives us an idea of which attributes best explain the topological structure of the connectome.

For consistency, we use the attributes hemisphere (the side of the brain a neuron is on), cell_type (the high-level classification of a neuron as interneuron, sensory, or motor) and their combination as groups for fitting. The analysis is only preformed on the Ciona Intestinalis Connectome, two C. Elegans Worm Wiring Connectomes, and the 8 C. Elegans Developmental Connectomes, because only these have both those attributes defined.

Note that the code to do the actual SBM fitting is located in analysis/apriori_sbm_code.py

import os
import sys
import matplotlib.pyplot as plt
sys.path.append("../")
import pandas as pd
from graph import GraphIO
from analysis.apriori_sbm_code import fit_apriori_sbm
import seaborn as sns
import numpy as np

Ciona Intestinalis

We’ll start with the single Ciona Connectome. First we load the data:

path = '../json_connectomes/ciona.json'
ciona_connectome, _, _, _ = GraphIO.load(path)

Now we’ll fit an SBM for each grouping

# In Ciona, Hemisphere is called side and cell_type is called annotation
groups = [['Side'], ['Annotation'], ['Side', 'Annotation']]
rows = {'Ciona Intestinalis': {}}
for group in groups:
    rows['Ciona Intestinalis'][str(group)] = fit_apriori_sbm(ciona_connectome, group, plot_sbms=False)
ciona_results = pd.DataFrame.from_dict({(i,j): rows[i][j]
                           for i in rows.keys()
                           for j in rows[i].keys()},
                       orient='index')
ciona_results.head()
bic likelihood n_params estimator
Ciona Intestinalis ['Side'] -23033.922165 -10863.237119 121 SBMEstimator()
['Annotation'] -22820.202349 -11323.658336 16 SBMEstimator()
['Side', 'Annotation'] -26206.331665 -10245.149497 529 SBMEstimator()

Great, now lets plot -BIC for different groupings. The lowest -BIC is set to 0, since the scale doesn’t really matter here.

colors = sns.color_palette("deep")[0]
fig, ax = plt.subplots(2, 1)
out_index = ciona_results.index.levels[0]
for i, key in enumerate(out_index):
    data = ciona_results.loc[key]
    bic_data = data['bic'].to_numpy()
    norm_bic = bic_data - np.min(bic_data)
    like_data = data['likelihood'].to_numpy()
    norm_like = like_data - np.min(like_data)
    _ = ax[0].scatter(data['bic'].index,
                  norm_bic,
                  label=key,
                  color=np.array(colors))
    _ = ax[1].scatter(data['likelihood'].index,
                  norm_like,
                  label=key,
                  color=np.array(colors))
    _ = ax[0].set(xlabel='Group', ylabel='-BIC')
    _ = ax[1].set(xlabel='Group', ylabel='Likelihood')
_ = fig.suptitle('BIC and Likelihood: Ciona Intestinalis')
_ = fig.set_size_inches(6, 8)
_images/apriori_sbms_8_0.png

It appears annotation has the best BIC but the worst likelihood. This could be because annotation has much fewer parameters than the other groupings. (For Ciona ‘Side’ is broken into 11 groupings instead of just ‘Left’ and ‘Right’). Therefore, based on the BIC, annotation explains the structure better, and the other groupings are likely fitting noise.

C. Elegans Developmental Conectomes (Witviliet)

We will preform this analysis for all 8 developmental stages at once.

# load some elegans Witvliet data
base = '../json_connectomes/witvilet/'
graph_files = sorted(os.listdir(base))
wit_connectomes = []
for f in graph_files:
    # These contain multiple synapse types, and thus can have multiedges and need to be flattened.
    # Alternatively, we could preform the analysis separately for each synapse type
    wit_connectomes.append(GraphIO.flatten_multigraph(GraphIO.load(os.path.join(base, f))[0]))

Now we will fit sbms for each grouping for each connectome.

groups = [["hemisphere"], ["cell_type0"], ["hemisphere", "cell_type0"]]
rows = {}
ages = []
for i in range(len(wit_connectomes)):
    age = wit_connectomes[i].graph['age']
    ages.append(age)
    g_row = {}
    for group in groups:
        g_row[str(group)] = fit_apriori_sbm(wit_connectomes[i], group, plot_sbms=False)
    rows['Witvliet ' + str(age) + ' hours'] = g_row
wit_results = pd.DataFrame.from_dict({(i,j): rows[i][j]
                           for i in rows.keys()
                           for j in rows[i].keys()},
                       orient='index')
ages = np.array(ages)
wit_results.head(24)
bic likelihood n_params estimator
Witvliet 0 hours ['hemisphere'] -8016.577339 -3961.208692 9 SBMEstimator()
['cell_type0'] -7651.218552 -3694.831560 25 SBMEstimator()
['hemisphere', 'cell_type0'] -8917.613900 -3574.749594 169 SBMEstimator()
Witvliet 5 hours ['hemisphere'] -9913.659242 -4909.418897 9 SBMEstimator()
['cell_type0'] -9401.135504 -4510.924858 36 SBMEstimator()
['hemisphere', 'cell_type0'] -10818.117288 -4376.558445 196 SBMEstimator()
Witvliet 8 hours ['hemisphere'] -10035.203502 -4970.007348 9 SBMEstimator()
['cell_type0'] -9387.116384 -4561.351516 25 SBMEstimator()
['hemisphere', 'cell_type0'] -10628.877074 -4420.721409 169 SBMEstimator()
Witvliet 16 hours ['hemisphere'] -11716.860426 -5810.567133 9 SBMEstimator()
['cell_type0'] -10907.811713 -5320.952857 25 SBMEstimator()
['hemisphere', 'cell_type0'] -12142.430570 -5172.453006 169 SBMEstimator()
Witvliet 23 hours ['hemisphere'] -14771.539210 -7337.602882 9 SBMEstimator()
['cell_type0'] -13806.911934 -6710.789074 36 SBMEstimator()
['hemisphere', 'cell_type0'] -15164.255483 -6533.163547 196 SBMEstimator()
Witvliet 27 hours ['hemisphere'] -14586.909618 -7245.077303 9 SBMEstimator()
['cell_type0'] -13437.319264 -6584.277672 25 SBMEstimator()
['hemisphere', 'cell_type0'] -14596.318736 -6389.737317 169 SBMEstimator()
Witvliet 45 hours ['hemisphere'] -19049.862125 -9476.429417 9 SBMEstimator()
['cell_type0'] -17546.406610 -8579.196723 36 SBMEstimator()
['hemisphere', 'cell_type0'] -19108.022424 -8341.470073 225 SBMEstimator()

Let’s plot -BIC for different groupings. Again the lowest -BIC for each developmental stage is set to 0, allowing for easy comparison.

fig, ax = plt.subplots(2)
out_index = wit_results.index.levels[0]
cm = plt.cm.get_cmap('Greens')
colors = cm(ages/max(ages))
for i, key in enumerate(out_index):
    data = wit_results.loc[key]
    bic_data = data['bic'].to_numpy()
    norm_bic = bic_data - np.min(bic_data)
    like_data = data['likelihood'].to_numpy()
    norm_like = like_data - np.min(like_data)
    _ = ax[0].scatter(data['bic'].index,
                     norm_bic,
                     label=str(ages[i]) + ' hours',
                     color=np.array(colors)[i])
    _ = ax[1].scatter(data['likelihood'].index,
                     norm_like,
                     label=str(ages[i]) + ' hours',
                     color=np.array(colors)[i])
    _ = ax[0].set(xlabel='Groups, stage ' + str(i), ylabel='-BIC')
    _ = ax[1].set(xlabel='Groups, stage ' + str(i), ylabel='Likelihood')
_ = fig.suptitle('BIC and Likelihood: C. Elegans at Different Developmental Stages')
_ = fig.set_size_inches(6, 8)
_ = ax[0].legend(loc='best')
_ = ax[1].legend(loc='best')
_images/apriori_sbms_15_0.png

Once again, we find that cell type explains the topology the best. This is still not entirely surprising, since the connectivity of interneurons should be very different than that of sensory or motor neurons

C. Elegans Male and Hermaphrodite Conectomes (Worm Wiring)

# load chemical synapse connectome files
path_h = '../json_connectomes/worm_wiring/connectome/Hermaphrodite/0.json'
path_m = '../json_connectomes/worm_wiring/connectome/Male/0.json'
ww_connectomes = []
ww_connectomes.append(GraphIO.load(path_h)[0])
ww_connectomes.append(GraphIO.load(path_m)[0])
groups = [["hemisphere"], ["cell_type0"], ["hemisphere", "cell_type0"]]
types = ['Hermaphrodite', 'Male']
rows = {}
for i in range(len(ww_connectomes)):
    g_row = {}
    for group in groups:
        g_row[str(group)] = fit_apriori_sbm(ww_connectomes[i], group, plot_sbms=False)
    rows['C. Elegans ' + types[i]] = g_row
ww_results = pd.DataFrame.from_dict({(i,j): rows[i][j]
                           for i in rows.keys()
                           for j in rows[i].keys()},
                       orient='index')
ww_results.head(6)
bic likelihood n_params estimator
C. Elegans Hermaphrodite ['hemisphere'] -45484.654884 -22687.264567 9 SBMEstimator()
['cell_type0'] -37783.419558 -18500.151558 64 SBMEstimator()
['hemisphere', 'cell_type0'] -41098.822390 -17588.252151 484 SBMEstimator()
C. Elegans Male ['hemisphere'] -53425.404138 -26655.512739 9 SBMEstimator()
['cell_type0'] -43612.375238 -21399.507936 64 SBMEstimator()
['hemisphere', 'cell_type0'] -47806.496439 -20541.786468 529 SBMEstimator()

One last time, we’ll plot -BIC and Likelihood, the lowest -BIC is again set to 0.

fig, ax = plt.subplots(2)
out_index = ww_results.index.levels[0]
colors = sns.color_palette("dark")[2]
for i, key in enumerate(out_index):
    data = ww_results.loc[key]
    bic_data = data['bic'].to_numpy()
    norm_bic = bic_data - np.min(bic_data)
    like_data = data['likelihood'].to_numpy()
    norm_like = like_data - np.min(like_data)
    if 'Male' in key:
        shape = '^'
    else:
        shape = 'o'
    _ = ax[0].scatter(data['bic'].index,
                     norm_bic,
                     label=key,
                     color=np.array(colors),
                     marker=shape,)
    _ = ax[1].scatter(data['likelihood'].index,
                     norm_like,
                     label=key,
                     color=np.array(colors),
                     marker=shape,)
    _ = ax[0].set(xlabel='Groups, ' + types[i], ylabel='-BIC')
    _ = ax[1].set(xlabel='Groups, ' + types[i], ylabel='Likelihood')
_ = ax[0].legend(loc='best')
_ = ax[1].legend(loc='best')
_ = fig.suptitle('BIC and Likelihood: Worm Wiring C. Elegans Hermaphrodite and Male')
_ = fig.set_size_inches(6, 8)
_images/apriori_sbms_21_0.png

For a third time, cell type groupings get the best BIC. This makes sense since this is the same species as the developmental data