This report deals with the exploratory and statistical analysis of the Kristina15 data set. Put a link to the data here and maybe a citation?
From email correspondance with Kristina Micheva we have the following definitions of the given markers.
On Feb 8, 2016, at 2:00 PM, Kristina Micheva kmicheva@stanford.edu wrote:
and
On March 10, 2016, 00:29:04 (UTC), Kristina Micheva kmicheva@stanford.edu wrote:
There are 2 different Synap channels (2 different antibodies were used), so that part is fine. And 2 different VGluT1 channels (same antibody but done at different times) The NOS channel is the same, so count it as one even though it appears twice. It is listed two times because it can be found at both excitatory and inhibitory synapses. This is where your count of 25 comes, even though there are 24 channels. I would also add the 2 Synap channels to the Inhibitory presynaptic category - there is supposed to be synapsin there, but at lower levels compared to excitatory presynaptic category.
227
in the kohenen.m
file which can be found in the dropbox folder.Synap
and Synap
have been augmented to Synap_1
and Synap_2
for clarity.VGlut1
and VGlut1
have been augmented to VGlut1_t1
and VGlut1_t2
to distinguish between the different times of collection (which are unknown).On Thu, Apr 14, 2016 at 3:05 AM, Kristina Micheva kmicheva@stanford.edu wrote:
I suggest: Synap, VGluT1, VGluT2, psd, gad, vgat, gephyr,
Or a bit bigger: Synap, VGluT1, VGluT2, psd, gad, vgat, gephyr, VGlut3, CB1
On Apr 12, 2016, at 9:54 AM, Jesse L. Patsolic studiojlp@gmail.com wrote:
Kristina,
Out of the markers available, which do you think are the best to use as a subset?
How were the puncta processed?
Array tomography images, then what?
The [sic] corresponds to 24 channels x 6 features per synapse, ordered like c0f0,c0f1,c0f2,c0f3,c0f4,c0f5,c1f0,c1f1… etc
f0 = integrated brightness
f1 = local brightness
f2 = distance to Center of Mass
f3 = moment of inertia around synapsin maxima
f4,f5 are features that I forget what they are.. would need to ask brad.
i would throw them out, I did so in my kohonen code (which you have, its in matlab).
Here we read in the data and select a random half of it for exploration.
featFull <- fread("../data/synapsinR_7thA.tif.Pivots.txt.2011Features.txt",showProgress=FALSE)
### Setting a seed and creating an index vector
### to select half of the data
set.seed(2^10)
half1 <- sample(dim(featFull)[1],dim(featFull)[1]/2)
half2 <- setdiff(1:dim(featFull)[1],half1)
feat <- featFull[half1,]
dim(feat)
# [1] 559649 144
## Setting the channel names
channel <- c('Synap_1','Synap_2','VGlut1_t1','VGlut1_t2','VGlut2','Vglut3',
'psd','glur2','nmdar1','nr2b','gad','VGAT',
'PV','Gephyr','GABAR1','GABABR','CR1','5HT1A',
'NOS','TH','VACht','Synapo','tubuli','DAPI')
## Setting the channel types
channel.type <- c('ex.pre','ex.pre','ex.pre','ex.pre','ex.pre','in.pre.small',
'ex.post','ex.post','ex.post','ex.post','in.pre','in.pre',
'in.pre','in.post','in.post','in.post','in.pre.small','other',
'ex.post','other','other','ex.post','none','none')
nchannel <- length(channel)
nfeat <- ncol(feat) / nchannel
## Createing factor variables for channel and channel type sorted properly
ffchannel <- (factor(channel.type,
levels= c("ex.pre","ex.post","in.pre","in.post","in.pre.small","other","none")
))
fchannel <- as.numeric(factor(channel.type,
levels= c("ex.pre","ex.post","in.pre","in.post","in.pre.small","other","none")
))
ford <- order(fchannel)
## Setting up colors for channel types
Syncol <- c("#197300","#5ed155","#660000","#cc0000","#ff9933","mediumblue","gold")
ccol <- Syncol[fchannel]
exType <- factor(c(rep("ex",11),rep("in",6),rep("other",7)),ordered=TRUE)
exCol<-exType;levels(exCol) <- c("#197300","#990000","mediumblue");
exCol <- as.character(exCol)
fname <- as.vector(sapply(channel,function(x) paste0(x,paste0("F",0:5))))
names(feat) <- fname
fcol <- rep(ccol, each=6)
mycol <- colorpanel(100, "purple", "black", "green")
mycol2 <- matlab.like(nchannel)
f <- lapply(1:6,function(x){seq(x,ncol(feat),by=nfeat)})
featF <- lapply(f,function(x){subset(feat,select=x)})
featF0 <- featF[[1]]
f01e3 <- 1e3*data.table(apply(X=featF0, 2, function(x){((x-min(x))/(max(x)-min(x)))}))
fs <- f01e3
### Taking log_10 on data + 1.
log1f <- log10(featF0 + 1)
slog1f <- data.table(scale(log1f, center=TRUE,scale=TRUE))
We now have the following data sets:
featF0
: The feature vector looking only at the integrated brightness features.fs
: The feature vector scaled between \([0,1000]\).logf1
: The feature vector, plus one, then \(log_{10}\) is applied.slog1f
: The feature vector, plus one, \(log_{10}\), then scaled by subtracting the mean and dividing by the sample standard deviation.tmp <- as.numeric(table(fchannel))
corrf <- cor(featF0)[ford,ford]
corrplot(corrf,method="color",tl.col=ccol[ford], tl.cex=0.8)
tmp <- as.numeric(table(fchannel))
bford <- order(rep(fchannel,each=6))
cr <- rep(ccol, each=6)
corrf <- cor(feat)[bford,bford]
corrplot(corrf,method="color",tl.col=cr[bford],tl.cex=0.8)