# Model

A stochastic block model is a statistical model for graphs that assumes
connectivity only depends on the underlying **type** of the node. We
are interested in making inferences about those node types
based on the observed graph. For this paper we extended
a nonparemetric type of block model to incorporate distance-dependence
and additional feature metadata, and applied it to connectome reconstruction.

What follows is a simple, and hopefully intuitive explanation of what the model
finds in data. For details, both neural and mathematical, see the paper

### Going to a conference

Imagine going to a conference and watching researchers attempt to talk
to one another. We record when researcher \(e_i\) *tries to talk to*
researcher \(e_j\). Note that "tries to talk to" is not symmetric --
I might try and talk to someone, but that doesn't mean they will
talk to me (this happens all too often!). We end up
with a directed graph of the data that looks like this

What can we learn from this graph? Are there certain patterns of
interaction, or certain types of researchers?

When we fit a stochastic block model to this data, we are learning a
hidden class or type, \(m_i\), for each node \(e_i\), with the prior
belief that that type determines connectivity. When we sort the graph,
a pattern emerges:

Our model has found four types of people at this conference.
We can see postdocs, grad students, faculty, and vendors.

### The importance of distance

Let's assume that the conference is massive, like the annual Society For Neuroscience
meeting at ~30,000 people. With a conference this large,
it may be the case that most people never even get near-enough to each
other to try and talk.

This time, we have some explicit distance-dependence in our data. That is, we know
if \(e_i\) tried to talk to \(e_j\) and how far away \(e_i\) and \(e_j\) were. If we ignore this distance-dependence, we "overcluster":

But when we explicitly incorporate distance, we see we recover the
four, true groups.

As datasets grow larger and more sparse, the importance of accomodating distance
as well as other prior knowledge only grows. Read more in the paper!

You can try this example yourself via the jupyter notebook.