Do you know what you don’t know? A gap analysis of Neuroscience Data.

My thesis adviser, a colorful spirit and one whose wisdom will be missed, used to say that undergraduate or professional students differed from graduate students in that they were asked to learn what was known about a subject, while graduate students were asked to tackle the unknown.

We, in higher education, are essentially seeking to find out what is not known and start to come up with new answers. How does one find out what is not known? In fact, is it possible to do that? Don’t most graduate students or post doctoral fellows add onto a lab’s existing body of knowledge? Adding to the unknown by building on the known? If this is how we work then does this create a very skewed version of the brain? How would we even know what is really unknown?

But are we not in the omics era? Genomics, proteomics and every other “omics”? We no longer want to know about a gene, we want to know about all of the genes, the genome of an organism. We want to account for all things of the type DNA and figure out which parts do what. In neuroscience, this tends to be a little more difficult. Mainly because we do not have a finite list of things that we can account for. We have a large quantity of species with brains, or at least ganglia, we have billions of cells and many more connections between them in a single human brain. The worst part is that these connections are not even static so a wiring diagram is only good for a short time for a single brain and then the brain reorganizes some of these connections.

Is the hope for an “omics” approach to neuroscience?

Well, the space is not infinite and has been studied over the last 100+ years so we have some ways of getting at the problem. We have a map!
Can we use this map to figure out some basic information about what we do and do not study? Well, the short answer at least for some things seems to be yes!

The Neuroscience Information Framework ( project has been aggregating data of various sorts that is useful to neuroscientists, and also a set of vocabularies for all of the brain parts, the map of the nervous system. So we can start to look at which labels are used for tagging data, and which are found in the literature? Are all parts of the brain equally represented by relatively even amounts of data or papers or are there hot spots and cold spots for data?

Below is a heat map generated for data vs the canonical brain regions (a hierarchy built to resemble what one may find in a graduate level text book of neuroanatomy).
Screen Shot 2013-10-17 at 1.28.39 PM

Albeit the heat map is very hard to read (the darker the green the more data, you can generate your own by clicking on the graph icon in NIF), there is little doubt that all brain regions are not equal, and some have very little data, while others have a plethora of data begging the question: Are there popular brain regions and not-so-popular brain regions?

Indeed, there are brain region annotations that are found more often, when looking at data and much like pop stars, they tend to have shorter names. The most popular data label is actually “brain”, and the least popular appears to be the “Oculomotor nerve root”. This is starting to tell us that most data is just labeled as “brain vs kidney”, but can we do better as neuroscientists? In fact, we can break down the labels into major regions like hindbrain, midbrain and forebrain and add up all of the data that fit into each of these. Note, most of the data are attributed to the forebrain, housing some of the most popular brain regions such as the cerebral cortex and the hippocampus, but the hindbrain also comes back with some reasonable data, mainly for the cerebellum. It turns out that adding up all the data labels for midbrain regions results in an awkward sense that the midbrain may be completely non-essential to brain research. On the other hand, removing the midbrain appears to be essential to life, so why do neuroscientists not know much or at least publish much about the midbrain?

Screen Shot 2013-10-17 at 3.57.58 PM

So it is at least partially possible to start to view neuroscience as an “omics” science and I for one am very excited about this possibility. Some of the first things it is telling us is that we don’t have data coverage in some regions and that is an important piece of data because when we know what we don’t know, we can ask the appropriate questions. Also, if you know of someone hiding a big pile of data about the midbrain in a desk drawer, I would like to formally ask you to share it with NIF (just email so that we can stop thinking of the midbrain as the tissue equivalent of fly-over country.

This entry was posted in by Anita Bandrowski, Technologies
The Society for Neuroscience and its partners are not responsible for the opinions and information posted on this site by others.
Anita Bandrowski

About Anita Bandrowski

Dr. Bandrowski trained as a neurophysiologist at UCR and Stanford, however moved to bioinformatics with the human genome project at Celera Inc., seeing that high throughput science has much to teach biologists. Currently working at the center of research in biological systems at UCSD, Dr. Bandrowski advocates for and builds systems that attempt to make sense of the vast information being produced by biologists.

2 thoughts on “Do you know what you don’t know? A gap analysis of Neuroscience Data.

  1. There’s something wrong with the table of Brain region popularity. Most obviously the cerebral cortex doesn’t appear; also the hippocampus is missing — and I’m skeptical that the cochlear nuclear complex is really one of the most popular regions.

    More generally I wonder how representative the NIF data are. The superior and inferior colliculi are midbrain structures, and they are significant research topics, especially the SC.

    I certainly agree though that research effort is often unbalanced. The most striking example to me is that the brain systems underlying suffering and aversion don’t get 1/10 the attention of the systems underlying pleasure and reward, even though they are equally extensive and at least equally important.

    Best regards, Bill

    Flag as inappropriate

    • Hi Bill,
      Thanks for your response.

      Actually, I too was a little surprised that the Hippocampus did not make the top 10 with 20M results (see, but the Inferior colliculus has a total of 34K data records associated, making it orders of magnitude less “popular”. As you say there is a lot of focus in some areas and not others, but another issue is that data does not accurately reflect what is in the literature. Data is relatively new to science, making recent topics more prevalent.

      I invite you to look at the completeness of our data. An up to date list can be found on line (* and the update schedule with when data was accessed is here, but it is important to note that the number of data sources continues to grow some disappear and as a community we are struggling with issues such as what does the data mean? What is the value of data? Should we keep data indefinitely or just for a specific time? What is the quality of the data? How can we trim false positives and account for the more of the false negatives? We also struggle with ranking of heterogeneous data records, an unsolved problem in computer science.

      We invite the community to use data and help to solve some of these problems and any others they may have.

      Big data is coming, but what does big data mean?
      The gap analysis is a first attempt in making some sense of this, but I would seriously doubt that this is the final word.

      By the way, if you do have some data about aversion in midbrain, please let me know, I would love to make it public.

      Flag as inappropriate

  2. Pingback: Do you know what you don’t know? A gap an...

Leave a Comment