Neuroscience and psychology today has advanced significantly. With the use of neuroimaging methods such as functional magnetic resonance imaging (fMRI) and electroencephalography (EEG), human beings have gradually revealed the secrets behind how our brains perceive, recognize and memorize things. However, if you’d like to have a detailed, neuronal-level elucidation on how brains realize its functions, you should be very disappointed because no one is currently capable of doing so. In other words, although our cerebrums are no longer a pitch-black box, it’s still at least a “gray” box, with a lot of enigmas yet to be explained.
The black box problem also affects the artificial intelligence (AI) that is based on artificial neural networks (ANNs), and such impenetrability understandably triggers some problems (e.g., the neural networks we use for training are usually unnecessarily big, costing us too much computational power to train them) and concerns on AI (e.g., “will AI go out of our control one day?”). If only we can know better about how a neural network operates, people may feel more comfortable with this technology, and the development of new AI techniques should be even faster and smoother. Therefore, I believe knowing the inner mechanism of ANNs is a crucial topic for AI research today.
Fortunately, since ANNs are on the basis of a real nervous system, it opens an opportunity for us to cross-reference the research outcomes from these two fields and eventually advance our understanding in both real and simulated neural networks. This is exactly what we attempt to achieve in the series “Neuroscience for AI Developers”. In this particular article, we digest the studies from neuroscience and AI about a broad topics and work out a theory we called Switch Hypothesis, which can potentially elucidate how our brain and an ANN identifies a certain target (such as a cat or a car). We hope that through these discussions, the mechanisms hidden between the two black boxes will soon be revealed to the public.
First of all, what is a "switch"?
For those who have studied ANNs, the graph above should be very familiar to you since this is how an artificial neural network is usually depicted. Data such as an image is first fed to the neural network by the input layer. And after flowing through one or more hidden layers, the processed data will eventually activate the nodes in the output layer with different “intensities”. In the case above, the nodes in the output layer are corresponding to “cat”, “dog”, “bird” and “fish” respectively; if, say, the node of bird gets the highest activation, given that the ANN above is well-trained, we can expect that the input image has a higher chance to be an image of a bird.
In Switch Hypothesis, we call these nodes in the output layers “switches” because the recognition process is implemented by “switching on (i.e., having high activation values)” the proper node. Of course, this is not the most important reason for us to coin the term (otherwise, we are just inventing a redundant terminology). To truly understand its meaning, we must discuss the relevant studies in neuroscience.
To a real cerebrum, a “switch” of a certain target is a neuron that is maximally activated by that very target (e.g., a “switch” of oranges is a neuron that maximally responds to everything about an orange, such as its color, appearance, odor, taste and so forth). Note that we use the word “maximally” in our description, and that’s because although weaker in activation, a switch usually responds to more than one certain target. Such phenomenon has great importance to our hypothesis, and we’ll enlarge on that later in this article.
Although being arguable, there are evidence allowing us to presume that these “switches” present in our wetware as well. One of the most famous support of the existence of switches was found by Rodrigo Quian Quiroga, a biologist at the University of Leicester, in 2005. The cells he discovered in human medial temporal lobe, now known as concept cells or Jennifer Aniston cells, are famous for responding remarkably selective to a particular person or object, such as Jennifer Aniston, Luke Skywalker, Halle Berry or Tower of Pisa (Quiroga et al., 2005; Quiroga, 2012).
Other evidence that bolsters the idea of “switches” includes the “place cells” found in hippocampus, the hippocampal neuron that majorly responds to the concept of “nest” in mice (Lin et al., 2007) and so forth.
Similarly, an ANN-like network structure should also present in a real brain, just in a more complicated fashion. Figure 4 demonstrates a hypothetical and simplified model of a cerebral network structure. Note that most areas mentioned in it has an actual counterpart in our cerebrum. The area for processing light bars with different orientation is corresponding to our primary visual cortex, the area specializes for recognizing words to visual word form area (VWFA), the area for recognizing faces to fusiform face area (FFA), the area for recognizing body parts to extrastriate body area (EBA), the area for recognizing places to parahippocampal place area (PPA), and the area that deals with abstract concepts may be corresponding to the medial temporal lobe.
According to the graph above, we can see that the structure of a real neural network has at least the following differences comparing to an artificial one:
Instead of mixing all neurons together, our brain has detailed division of labor, which means that neurons are arranged into groups depending on their function.
A real brain integrates information from different sensory modalities (vision, audition, olfaction, haptic perception, etc.). Most artificial neural networks, on the other hand, deal with only one modality.
Finally, in our next article, we’ll explain that a switch neuron can be utilized as a trigger for memory recall in a real neural network. As a matter of fact, that’s the main reason behind the term “switch” (P.S., if you’re interested in it, please subscribe or join our free membership to get a notification of our future articles).
The meaning of a switch's activation
Here, we borrow the idea from artificial neural networks and assume that the same thing happens in our brain. That is:
The activation level of a switch is positively correlated to the probability that the input data is the target that the switch represents.
For instance, if we see an animal with our eyes and somehow the activation level of the switch of dog in our brain gets higher than the switch of cat, then we know that this animal has a greater chance to be a dog instead of a cat. If that is true, we can restrain the range of a neuron’s activation between 0 and 1 for both an ANN and a real neural network.
Note that in real life, the amplitude of a neuron’s discharge (i.e., action potentials) is actually a constant irrelevant to how strong we stimulate that very neuron. In other words, any stimulus with the strength exceeds a certain threshold will give rise to action potentials identical in signal intensity. However, the “firing rate” of a neuron does change correspondingly with the strength of stimulus. That is to say, the stronger we stimulate a neuron, the more action potentials it fires in a certain period of time. In this regards, the “activation level” we mentioned previously is in reality referring to “the frequency of a neuron’s firing rate“, not the intensity of a neuron’s discharge. The following figure illustrates the idea:
As we’ve mentioned before, a switch is usually not respond to only one stimulus. However, it will have its maximal activation when encountering a certain target. Furthermore, we have reason to believe that for a particular switch neuron, the closer or more similar an input data is to the target that the switch represents, the greater the switch’s activation will be. For instance, Figure 6 (b) demonstrates how the activation level of a particular neuron in primary visual cortex changes with the differently oriented light stimuli. According to the figure, we can see that the light stimulus whose orientation equals 45º counterclockwise from vertical activates the neuron maximally (i.e., in our hypothesis, the neuron is perceived as the switch for the 45º counterclockwise orientation). Moreover, such activation gradually declines when the orientation of the light stimulus moves away from the optimal orientation, and finally reach to its minimum when the orientation becomes 45º clockwise from vertical.
The main rules governing the weights of connections
We finally reach to the most critical, but perhaps also the most controversial, part of this article: how the weights of connections in a neural network are decided? But before we begin, I must state beforehand that due to the complexity of a neural network, both real and artificial one, it’s difficult to conduct a thorough research for monitoring the inner dynamics of such network. Accordingly, the evidence that we can adduce for bolstering our allegation is very few, if not none. Therefore, the hypotheses we are going to discuss in this section are mainly based on logical inference. We believe that although such hypotheses cannot be 100% correct, at least it can point out a valid direction for us to start understanding what happens inside a neural network.
To make thing easier to comprehend, we’ll use the following red square as an example to demonstrate our idea:
As mentioned previously, our cerebral cortex has many areas that specialize for handling a certain feature (such as colors, visual stimuli in different orientation, etc.). And as a matter of fact, the way that our nervous system perceives a target (such as the red rectangle in Figure 7) is to first extract all the feature information from the target (in this case: the color red, 2 vertical straight lines, 2 horizontal straight lines and 4 right-angle vertices) and send them to the proper cerebral areas. In these areas, you can find the switches corresponding to the input features (e.g., in primary visual cortex, we can find some neurons that are most sensitive to vertical lines, and others to horizontal lines), and they will be activated by the input. Then, these switches in lower level will gradually converge to the switches in higher level, until eventually forming a grand switch that is capable of representing the target perceived (that is, when the grand switch lights up strongly, we can confidently infer that the things perceived is the target). The whole resulting structure is just like the one shown in Figure 4.
In the case of the red rectangle above, we can illustrate the whole network as follows:
Note that in Figure 8, we separate two kinds of processing; one is perceptual and another is cognitive. The difference between these two is that the switches involve in cognitive processing maintain perceptual invariance (e.g., recall the “Jennifer Aniston cells” that we’ve introduced earlier; these cells responds strongly to the face of Jennifer Aniston, no matter in what angle or where on our retina the actress’ face is presented). The switches involve in perceptual processing, in contrast, do not have such invariance so that their activation is actually influenced by how or in what condition a stimulus is presented (e.g., the switch may only respond to Jennifer Aniston’s face if it’s presented in the front face view). Since perceptual processing is extremely intricate yet important, we’re going to address the topic in an independent article in the future. For now, we shall focus on cognitive processing solely.
So, how exactly should we assign the weights of connections in the instance above? Well, the following is the answer that we propose (P.S., we omit the perceptual-processing part):
Let’s now explain in a top-down manner that how do we derive these weights and why they are reasonable. First, we assume that there are three “sub-groups” in this network, one for processing color (in it contains the switch “Color”), one for vertex (contains the switch “Vertex”), and the last one for edge (contains the switch “Edge”) of a geometrical form. According to Figure 7, we know that the red rectangle is defined by four conditions, and these conditions are governed by these sub-groups in the following way:
Switch “Color” of the grand switch “Red Rectangle” will be maximally activated when perceiving color “red”.
Switch “Vertex” of the grand switch “Red Rectangle” will be maximally activated when perceiving “4 right angles”.
Switch “Edge” of the grand switch “Red Rectangle” will be maximally activated when perceiving “2 vertical lines” & “2 horizontal lines”.
Since the red rectangle is decided by these conditions and nothing else, we expect that the grand switch “Red Rectangle” will be maximally activated (i.e., activation = 1) when switch “Color”, “Vertex” and “Edge” in Figure 9 are all maximally switched on. Also, considering that none of the three is more important than the others, the reasonable weights are therefore “1/3”. In other words, the activation of ”Red Rectangle” = (1/3 * activation of “Color”) + (1/3 * activation of “Vertex”) + (1/3 * activation of “Edge”), and “Red Rectangle” has its maximal response (i.e., activation = 1) when the activations of “Color”, “Vertex” and “Edge” are all equal to 1.
(P.S., Technically, the activation of the switch “Red Rectangle” should equal f((1/3 * activation of “Color”) + (1/3 * activation of “Vertex”) + (1/3 * activation of “Edge”)), where f(x) is the activation function of “Red Rectangle”; here, we assume that the output of f(x) is the maximal activation if the input is the maximal activation of a switch as well, that is, f(1) = 1, and ergo we omit it in our equation)
Similarly, in order to make the activation of switch “Vertex” equals 1, we need the switches representing “Right Angle” and the amount “4” to be activated together in the maximal level. Since there are only two conditions (i.e., “Right Angle” & “4”) necessary for activating “Vertex” and they are equally important, the reasonable weights between them and “Vertex” are therefore “1/2”. That is, the activation of ”Vertex” = (1/2 * activation of “Right Angle”) + (1/2 * activation of “4”), and “Vertex” has its maximal activation when both the activations of “Right Angle” and “4” equals 1.
As to “Color”, note that usually, the hue of an object is represented by the combination of “Red”, “Blue” and “Green” (i.e., the three primary colors) in different proportions. However, since the rectangle has no other color but red, the activation of the switch “Red” alone should be sufficient to maximally activate the switch “Color”. Thus, the weights between them is set to be 1.
(P.S., The situation for the switch “Edge” may be a little bit more complicated, but it follows the same principle, hence we will leave out its explanation here. You can try explicating it to check whether you’ve grasped the idea or not.)
So far, we’ve introduced the basic idea of how to assign weights to the connections in a neural network. However, things can be more complicated than previously discussed since the following two conditions are distinct and can affect how the weights are assigned:
To activate the switch, feature 1 “and” 2 must be activated together.
To activate the switch, feature 1 “or” 2 must be activated.
The rule we described on the top can explain the first condition, but what about the second? We’ll take an apple as an instance to explicate this. Let’s assume that an apple is defined by only two features: the first one is its shape, which is round; and the second is its color, which can be either red or green. This can be illustrated by the following graph:
Note that since switches “Red” and “Green” do not have to be activated together to induce the maximal activation of switch “Color” in the case of an apple, the reasonable weights here are “1” instead of “1/2”. In other words, the switch “Apple” will reach to its maximal activation if either switch “Round” and “Red” or “Round” and “Green” are maximally activated.
All in all, we can summarize the description above into two simple rules:
If one must trigger n equally important subsidiary switches simultaneously to maximally activate a grand switch, the weights between the grand switch and its subsidiaries should be “1/n”.
If one can maximally activate a grand switch by triggering any x of its n subsidiary switches (x < n), the weights between the grand switch and its subsidiaries should be “1/x”.
In the end of this section, we must emphasis again that the hypothesis we’ve mentioned here about how weights of connections are assigned is based on logical inference, and the mechanism for reaching such results in our brain, if existing, remains unknown. With that being said, we believe the framework proposed is logically reasonable, and there are some empirical studies that may be able to bolster the idea (however, since explaining these studies will take up a lot of space, we will leave this topic to one of our future articles).
The other factors that affect the weights of connections
Besides the principles introduced in the previous section, there are other independent factors capable of shaping the weights of connections. The following list includes some important ones:
Frequency of co-appearance:
Previously, we’ve been presuming that all subsidiary switches are equally important in respect of activating the grand switch. However, it may not always be true in the reality, and one of the reasons for making that happens (i.e., one or a few subsidiary switches are more important than others) is related to the frequency of co-appearance.
Let’s use the example shown in Figure 10 again. In it we assume that both switch “Red” and “Green” are connected to “Color” with the weighting equal to 1, and the premise for this to hold true is that the frequency for us to see a red apple is about the same for us to see a green one. However, if red apples are actually more prevalent than green apples, we should anticipate that the weighting between “Color” and “Red” will become larger than the weighting between “Color” and “Green”, and that's because the switch “Color” should be activated together with “Red” more often than with “Green” (here we cite the famous Hebb’s rule; “cells that fire together, wire together”).
Neurons’ firing rates & patterns:
There are evidence showing that neuron’s firing rate (i.e., activation level) has effects on connection strength between two neurons (Dan & Poo, 2006; Bear et al., 2007, p.716-717). But before we discuss further about such effects, we must introduce some terminology.
As we know, two neurons are connected through synapses, which is a structure that allows one neuron to pass a signal on to the next cell. The neuron that is sending signals is called a presynaptic neuron, and the one that receive the signals is called a postsynaptic neuron. The strength of connections that we’re talking about here is actually the strength of these synapses, and such strength can be reinforced by a mechanism known as long-term potentiation (LTP) and weakened by another one called long-term depression (LTD). The strengthening of synaptic transmission through LTP is believed to be the basis of our learning and memory today.
With the prior knowledge said in mind, we will now present two elementary principles scientists found regarding how connection strength alters with neuron’s activation:
When the presynaptic neuron is active and the postsynaptic neuron is “strongly activated” by other inputs at that same moment, the strength of the synapse becomes stronger (P.S., this is actually what Hebb’s rule all about).
When the presynaptic neuron is active and the postsynaptic neuron is only “weakly activated” by other inputs at the same moment, the strength of the synapse becomes weaker (P.S., “neurons that fire out of sync lose their link”).
However, such relationship can be more complicated than previously described since the timing of neuron’s firing is also capable of modifying the strength of connections (see spike-timing dependent plasticity and precise-spike-driven synaptic plasticity). If you’re interested in knowing more details about the link between neurons’ activation and synaptic strength, please notice our newest update; we will include this topic in another article.
How a neural network recognizes things & winner-take-all principle
In this section, we will use an actual example to demonstrate how the Switch Hypothesis explains the way a neural network recognize things, and this will involve a principle, which is known as the winner-take-all principle.
Considering the following neural network and supposing that all neurons in it share the same activation function, f(x), where x is the input, and f(x) is monotonically increasing (i.e., if x1 < x2, f(x1) ≤ f(x2)) with the range 0 ≤ f(x) ≤ 1:
Condition 1: Only switch “Round” is activated
If only switch “Round” is active, we can see that both switch “Soccer” and “Basketball” are switched on in about the same level of activation (i.e., f(activation of “Round” * 1/4)). In this case, the network cannot tell whether the input is a soccer or basketball.
Empirically, this conclusion is very rational. If we only know that something is round, it’s impossible to decide whether that “something” is a soccer or a basketball since they are both round. Instead, “soccer” and “basketball” become two potential interpretations for the input, and the system may now search for more information before it can make any certain conclusion.
Condition 2: Only switch “Surface Pattern 2” is activated
Once again, only one switch is active in Condition 2. However, the result is totally different this time.
Because the switch “Surface Pattern 2” is only connected to the switch “Basketball”, activating it will switch on the neuron “Basketball” and nothing else. That means, the neural network will interpret the input as a basketball. Note that although such activation is small (i.e., only f(activation of “Surface Pattern 2” * 1/4)), since “Basketball” is the biggest winner in this condition, it will still be selected as the correct interpretation according to the winner-take-all principle. In other words, the neural network cares more about the relative activation level than the absolute.
Condition 3: Switches “Black”, “White”, “Round” and “Surface Pattern 1” are activated
In the final condition, all features related to the switch “Soccer” are activated, giving the switch “Soccer” the maximal activation level, and the input is therefore interpreted as a soccer.
Note that in such situation, the switch “Basketball” is still activated, even though weaker, because it shares some common features with “Soccer”. However, due to the winner-take-all principle, “Soccer” is going to be selected as the final interpretation, not “Basketball”.
The model shown above can actually help us to elucidate a psychological phenomenon called “priming”, that is, our response to a stimulus will be influenced by another related stimulus we were previously exposed to. For instance, the image of a “basketball” will be recognized faster following the image of a “soccer” than the image of a “lake”. And in accordance with Figure 14, we can understand why: When the picture of a soccer is presented, not only the switch “Soccer” is activated, “Basketball” is also activated to a certain degree because some subsidiary switches of “Soccer” is also connected to “Basketball” (i.e., they share some common characteristics). And since the switch of “Basketball” has been “pre-activated”, it is only natural that its activation can reach to a higher level faster when being stimulated.
What is the advantage of using a “deep” network?
After the appearance of AlexNet, deep learning has become one of the most prevailing methods of AI. By increasing the number of hidden layers within a neural network (hence the word “deep”), an ANN may achieve a notably better performance than those that only have one hidden layer.
In this section, we will utilize our hypothesis and offer an explanation of why needing a deep network. But before getting into the detail, allow us to reveal our conclusion first, that is, the causal relationships that a deep neural network can handle are far more complicated than a shallow one. To better illustrate this, let’s turn to a more simplified example: exploring what advantage we shall have if a two-layer neural network is upgraded to three-layer.
Assuming that to trigger the maximal activation level of the switch, Switch A, the following two independent conditions must both be true (i.e., (1) & (2) = TRUE) at the same time:
(1) Features 1, 2 and 3 are all activated simultaneously.
(2) Features 4 and 5 are both activated simultaneously.
In a two-layer, oversimplified neural network like the one shown in Figure 15, whose weights of connections are assigned based on the rules previously introduced, the Switch A described cannot be realized since Switch A can be turned on simply by activating features 2, 3 and 4, with features 1 and 5 left inactive, and that violates both condition (1) and (2). The estimated activation of Switch A in this case is: (1*1/3) + (1*1/3) + (1*1/2) = 7/6, a value larger than 1!
A three-layer neural network, on the other hand, does a notably better job in realizing the Switch A required. As we can see in Figure 16, if one only activates features 2, 3 and 4 in the network, the estimated activation of Switch A is only (1*1/2*1/3) + (1*1/2*1/3) + (1*1/2*1/2) = 7/12, and this number is significantly lower than the one given by the two-layer network.
After we understand the case above, we can now check out what a four-layer neural network can do. The following is an instance:
Note that the required conditions for triggering the maximal activation level of Switch B in the example above are two, and both of them will have to be true. First, “f1 and f2 are activated simultaneously” and “f3 and f4 are activated simultaneously” happen concurrently. Second, “f5, f6 and f7 are activated simultaneously” and “f8 and f9 are activated simultaneously” happen concurrently.
It may look extremely intricate, but we believe it actually happens a lot in the real life (in fact, things in real life may be even more complex than this). The following is a simplified instance that can be fit into the network shown in Figure 17:
Resolve the conflict between “grandma cells” & “population coding”
In the last section of this article, we’d like to use our hypothesis to resolve the conflict between two seemingly contradictory hypotheses on how our brains recognize things.
The first hypothesis boldly suggests that all the information we need to recognize a target is stored in one (or very few) neuron. Such neuron is sometimes called a “grandma cell”, meaning that there should be a neuron in your brain that represents your grand mother alone. Although it sounds like an utter BS, we have discussed many empirical studies that support this notion, and you can check them out in the previous section “First of all, what is a switch?”. The other hypothesis is called population coding, and it assumes that the information about a target is encoded by the activation patterns of a group of neurons rather than just one or a few.
Switch hypothesis, however, points out a way to combine these ideas. The hypothesis suggests that neither the “grandma cells” nor “population coding” catches the whole picture of what happens inside our brain. To be more specific, in the lower layers of a neural network, the neurons are indeed carrying out population coding to process and record the info about the input target. However, when it reaches to the higher layer, the neurons will gradually converge according to certain rules, and eventually forming the “switches” that are maximally activated by a particular stimulus.
The main point here is that a switch neuron does NOT record information per se; it’s nothing but a “switch” in higher level that allows our brain to manipulate the neurons in lower level, which participate the population coding of a target, as a group conveniently. And they serve the functions exactly like a typical switch: when they are turned on, a mental image or a response about the target represented by the switch will be triggered, and the neural network will accordingly know how to interpret or what to do with the input.
Fearing the things that we don’t understand or can’t control is one of the most natural reactions of human beings. Therefore, I think unraveling the secrets inside the black box of artificial intelligence is a crucial and necessary topic for the studies of the field.
Switch hypothesis described in this article proffers a possible conjecture about how an artificial and a real neural network operate. Of course, there are still a lot of insufficiencies in our discussion, but we hope it can serve as a seed to prompt the future research on the inner mechanisms of neural networks. Through it, we may not only increase our understanding of our brain, but also gaining knowledge to build even more powerful yet controllable AI for more advanced applications.
Finally, due to limitations of space, there are a lot of important problems related to the hypothesis that we cannot address in this article (e.g., “how to encode a continual quantity with discrete neurons?”, “what are the detail mechanisms behind perceptual processing?”, “what are the keys for making an artificial general intelligence?”, etc.). If you are interested in them, please subscribe or join our free membership to get a notification of our future updates.
Footnote: If you have any question or opinion regarding to this article, please don’t hesitate to contact us through the following e-mail: firstname.lastname@example.org. Thank you so much for your reading and wish you a wonderful day.
Bear, M. F., Connors, B. W., & Paradiso, M. A. (Eds.). (2007). Neuroscience (Vol. 2). Lippincott Williams & Wilkins.
Caporale, N., & Dan, Y. (2008). Spike timing–dependent plasticity: a Hebbian learning rule. Annu. Rev. Neurosci., 31, 25-46.
Dan, Y., & Poo, M. M. (2006). Spike timing-dependent plasticity: from synapse to perception. Physiological reviews, 86(3), 1033-1048.
Desai, N. S., Cudmore, R. H., Nelson, S. B., & Turrigiano, G. G. (2002). Critical periods for experience-dependent synaptic scaling in visual cortex. Nature neuroscience, 5(8), 783.
Graupner, M., Wallisch, P., & Ostojic, S. (2016). Natural firing patterns imply low sensitivity of synaptic plasticity to spike timing compared with firing rate. Journal of Neuroscience, 36(44), 11238-11258.
Lin, L., Chen, G., Kuang, H., Wang, D., & Tsien, J. Z. (2007). Neural encoding of the concept of nest in the mouse brain. Proceedings of the National Academy of Sciences, 104(14), 6066-6071.
Ocker, G. K., Hu, Y., Buice, M. A., Doiron, B., Josić, K., Rosenbaum, R., & Shea-Brown, E. (2017). From the statistics of connectivity to the statistics of spike times in neuronal networks. Current opinion in neurobiology, 46, 109-119.
Quiroga, R. Q., Reddy, L., Kreiman, G., Koch, C., & Fried, I. (2005). Invariant visual representation by single neurons in the human brain. Nature, 435(7045), 1102.
Quiroga, R. Q., Kreiman, G., Koch, C., & Fried, I. (2008). Sparse but not ‘grandmother-cell’coding in the medial temporal lobe. Trends in cognitive sciences, 12(3), 87-91.
Quiroga, R. Q. (2012). Concept cells: the building blocks of declarative memory functions. Nature Reviews Neuroscience, 13(8), 587.