Cogn Comput (2014) 6:145–157 DOI 10.1007/s12559-013-9240-1
Decoding Word Information from Spatiotemporal Activity of Sensory Neurons Kazuhisa Fujita • Yusuke Hara • Youichi Suzukawa Yoshiki Kashimori
•
Received: 28 April 2013 / Accepted: 27 November 2013 / Published online: 8 December 2013 Ó Springer Science+Business Media New York 2013
Abstract Spatiotemporal activity of neurons is ubiquitous in sensory coding in the CNS. It is a fundamental problem for sensory perception to understand how sensory information is decoded from the spatiotemporal activity. However, little is known about the decoding mechanism. To address this issue, we are concerned with auditory system as a model system exhibiting spatiotemporal activity. We present here a model of auditory cortex, which performs a hierarchical processing of auditory information. The model consists of three layers of two-dimensional networks. The first layer represents auditory stimulus as a spatiotemporal activity of neurons. The second layer consists of feature-detecting neurons, which extract the features of phonemes and their overlaps from the spatiotemporal activity of the first layer. The third layer combines information of the sound features encoded by the second layer and decodes word information about the sound stimulus as a temporal sequence of attractors. Using the model, we show how the information of phonemes and words emerge in the hierarchical processing of the auditory cortex. We also show that the overlap between phonemes plays a crucial role in linking the attractors of phonemes. The present study may provide a clue for understanding the
K. Fujita (&) Tsuyama National College of Technology, 654-1 Numa, Tsuyama, Okayama 708-8506, Japan e-mail:
[email protected] K. Fujita Y. Kashimori Department of Engineering Science, University of Electro-Communications, Chofu, Tokyo 182-8585, Japan Y. Hara Y. Suzukawa Y. Kashimori Graduate School of Information Systems, University of Electro-Communications, Chofu, Tokyo 182-8585, Japan
mechanism by which word information is decoded from spatiotemporal activity of neurons. Keywords Decoding mechanism Spatiotemporal activity Word information Auditory system Neural model
Introduction Spatiotemporal patterns of action potentials are ubiquitous in sensory coding of the CNS. Experimental studies have shown the importance of spatiotemporal activity in information coding of sensory systems such as olfactory [27], somatosensory [1], visual [34], and electrosensory [15] systems. Sensory perception requires the binding of sensory features extracted from the spatiotemporal activity of sensory neurons. Thus, it is a fundamental problem for sensory perception to understand how sensory information is encoded and then decoded by sensory networks. Several efforts have been devoted to understand how temporal input pattern is represented by a neuronal population and how downstream systems decode the information of the temporal input pattern, as seen in the studies on liquid-state machines [30], echo-state machines [22], and state-dependent networks [5, 31]. Previous studies have also demonstrated that spatiotemporal activity is decoded by a read-out neuron or read-out neurons with a synaptic learning [2, 21, 24, 28, 32, 34, 41]. In most studies of these networks, Poisson and chaotic signals have been used as the temporal inputs; however, temporal inputs evoked by sensory stimuli have not been considered. Thus, it remains unclear how the features of sensory stimulus are extracted from the activity of neuronal population and how the features are combined in order to represent a sensory object.
123
146
To address this issue, we are concerned with auditory processing, because spatiotemporal processing is most prominent in auditory system. For example, both the spatial (spectral) and temporal structure of animal vocalization— from frog calls to bird song to monkey calls—are important features for auditory processing [31]. Similarly, speech is rich in spatiotemporal structure, and removing either spatial or temporal information impairs speech recognition [11, 39]. Given the importance of temporal information for processing of natural stimuli, it is not surprising that neural responses are often strongly dependent on temporal features of stimuli [8, 9, 46]. Furthermore, experimental studies have shown that sound stimuli evoke spatiotemporal activity in the primary auditory cortex (A1) [16, 17, 20, 44, 45]. In the present study, we present a network model that performs a hierarchical processing of auditory information. The model has only an essential function of auditory processing, although realistic auditory processing involves complex processing for words and speech. The model consists of three layers of two-dimensional networks. The first layer represents the spectrotemporal property of sound stimulus as a spatiotemporal activity of sensory neurons. Feature-detecting neurons in the second layer then extract the salient stimulus features, or the features of phonemes and their overlaps, from the spatiotemporal activity. In the third layer, the sound features of the phonemes and their overlaps are combined by dynamical attractors that are linked in a temporal order relevant to the sound features, thereby producing a unified meaning of sound stimulus. Using the model, we show how word information emerges in the hierarchical processing of auditory cortex. We also show that the overlap between phonemes, encoded in the second layer, plays a crucial role in linking the attractors of phonemes.
Cogn Comput (2014) 6:145–157
sequence of spatial activities, each of which is the activity of FR neurons evoked every short time period. The sound features are then represented as dynamical attractors in the third layer named feature-binding (FB) layer, and combined into a temporal sequence of the attractors, leading to the decoding of word information about the sound stimulus. Model of FR Layer As the model of the FR layer, we used the primary auditory cortex model by Yamaguchi et al. [48], which reproduces the propagation waves of A1 activity experimentally observed in guinea pigs. As shown in Fig. 1b, the network has a two-dimensional array of population units and has the two axes, one is the tonotopical axis representing frequency selectivity of input, or T-axis, and the other is the propagation axis, or P-axis, along which excitatory wave is propagated. The unit of the FR network was constructed with a pair of an excitatory and an inhibitory neuron, Eand I-neuron, as shown in Fig. 1c. The network has the tonotopical map, in which the E-neuron located on (i, j) site ((i, j)th E-neuron) receives the excitatory inputs from the (i - 1, j - 1)th, (i, j - 1)th, and (i ? 1, j - 1)th E-neuron (see the arrows in Fig. 1b). The excitatory connections between E-neurons facilitate propagation of neuronal activity across the FR network, whereas the inhibitory connection of the I-neuron in the unit suppresses the neuronal activity of the E-neuron, limiting the spatial area of the neuronal activity to a local area of the network. The balance in the excitatory and inhibitory connections between the E- and I-neurons makes the spatiotemporal activity of FR neurons stabilize as a stationary pattern. The mathematical description of the FR network model is given in ‘‘Appendix.’’ Model of FD Layer
Model Overview of Our Model We present a neural network model of auditory cortex for hierarchical processing of auditory information, which consists of three networks, with feedforward connections between them, as shown in Fig. 1a. The first layer receives a sound stimulus that has a spectrotemporal property and encodes the sound stimulus into a spatiotemporal activity of neurons on the tonotopical map of the first layer. The first layer is called feature-representation (FR) layer. The spatiotemporal aspect of the neuronal activity evoked in the FR layer is then detected by feature-detecting (FD) neurons in the second layer named FD layer. FD neurons extract the sound features of phonemes and their overlaps from a
123
The FD layer consists of a two-dimensional array of FD neurons, which contains NFD 9 NFD neurons, as shown in Fig. 2a. FD neurons extract the sound features from a spatiotemporal activity evoked in the FR network, based on the spatial snapshots of the FR activity evoked every short time period. FD neurons are formed in the following way. The spatiotemporal activity of FR neurons is reduced to an assembly of spatial patterns evoked every short time period, represented by a sequence of the spatial patterns at the time Tm sðm ¼ 1; 2; . . .; NÞ; PT1 ; PT2 ; PT3 ; . . .; PTN . The time interval between Tms was chosen so that the spatiotemporal patterns of FR neurons do not change significantly during the time interval. The value of the time interval was set at 10 ms, roughly corresponding to the time constant of neuronal membrane potential. This means a coarse-grained
Cogn Comput (2014) 6:145–157
147
(a)
(b) is
A
E
T-a x
C
FB
i+1, j-1
Ei,j
Ei, j-1
B
Ei-1,j-1
P-axis
(c) FD
Input
FR
Fig. 1 a The network model of auditory cortex. The model consists of three layers, feature-representation (FR) layer, feature-detecting (FD) layer, and feature-binding (FB) layer. A, B, and C in the FB layer indicate dynamical attractors. Each network of the three layers has a two-dimensional array of neurons. Auditory information is hierarchically processed along the pathway from the FR to the FB layer via the FD layer. The three layers are feedforwardly connected with each other. b, c The network model of the FR layer. b Twodimensional network model. The circles indicate the population units
(a)
FR
FD
of excitatory neurons, while the population units of inhibitory neurons are omitted. The excitatory connections among excitatory neurons are denoted by solid lines. Excitatory neurons receive external inputs at the left edge of the network. The P- and T-axis are the propagation axis and tonotopy one, respectively. c The model of a single population unit. The model consists of a pair of excitatory (E) and inhibitory (I) neuron. The open and filled circles mean the excitatory and inhibitory connections, respectively
(b)
(c)
A
A
AB
CA
BA
AC
B
C
B
C
T1
BC
CB
T2
(d)
TN
Time
Fig. 2 a Feature extraction of FD neurons from spatiotemporal activity of the FR layer. Spatiotemporal activity of the FR layer is reduced by the spatial patterns at the times T1, T2, and so on, which are regarded as averaged patterns every short time period. The decomposition to the spatial patterns enables FD neurons to extract the features from spatiotemporal activity of FR neurons, resulting in the representation of sound features in the FD layer. Activated FD neurons are depicted with the filled circles. b, c Dynamical linkage of the attractors representing the words ‘‘abc’’ (b) and ‘‘cba’’ (c). The attractors of the words consist of the attractors of phonemes, or A, B,
and C, and those of their overlaps, or AB, BC, and so on, which are linked in the temporal order of the phonemes and overlaps in each word. The framework of a word is determined by the attractors of phonemes, while the temporal correlations of phonemes are determined by the attractors of overlaps. d The learning rule used. The learning rule is based on STDP learning, where Dt ¼ tpost tpre , and tpre and tpost are the spike timing of presynaptic and postsynaptic activities, respectively
123
148
Cogn Comput (2014) 6:145–157
process for the spatiotemporal activity of the FR layer. Using these spatial patterns and the learning rule of Kohonen’s self-organized map [26], FD neurons sensitive to the spatial patterns were formed in the FD layer. In order to acquire the feature-detection ability of FD neurons, the sequence of PTm s ðm ¼ 1; 2; . . .; NÞ was repeatedly applied to the FD network. The strength of the synaptic connection from the FR neuron at (i, j) site to the winner FD neuron at (k*, l*) site is updated by Dwðij; k l ; FR; FD; tÞ ¼ k1 ðXijm wðij; k l ; FR; FD; tÞÞ; ðm ¼ 1; . . .; NÞ;
ð1Þ
where 0
1 sffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi ffi X ðk ; l Þ ¼ arg min@ ðXijm wðij; kl; FR; FD; tÞÞ2 A; kl
ij
ð2Þ Xm ij
k1 is the learning rate and is the output of the FR neuron at (i, j) site in the pattern PT_m, given by Eq. (10) in ‘‘Appendix.’’ The output of FD neurons is 1 for the winner neuron and 0 otherwise. The parameter values used are: NFD = 40, and k1 = 0.1. The stimuli of the words, ‘‘abcabc’’ and ‘‘cbacba’’, were used in the learning. The double representation is needed to form the FD neurons responding to the overlap features. Indeed, in the single representation of each word, the overlap features /ac/ and /ca/ would not be formed. In childhood, we learn a word by pronouncing it several times. Additionally, frequent switching of the two words in the word presentation enables the FD network to learn the two words equally. Hence, we chose the double representation as the word representation repeated by a few cycle. Each word stimulus was applied to the FR network for the time range of 0–620 ms, in the way described later in the ‘‘Representation of Sound Stimuli’’ section. The FR activities evoked by the two words were reduced to the two sets of spatial patterns, each being a temporal sequence of the spatial patterns of the FR activity evoked by each word stimulus, picked out every 10 ms. As the inputs to the FD layer, we used the two sets of spatial patterns, each contained 64 spatial patterns. The two sets of the spatial patterns were alternatively applied to the FD layer. The learning was performed by 12,400 steps. After the learning, it was confirmed that the two word stimuli to the FR network successfully activated the FD neurons detecting the features of the phonemes and their overlaps involved these words. Model of FB Layer The FB layer consists of a two-dimensional array of FB neurons, which contains NCFB neurons in the column and NRFB ones in the row. FB neuron was modeled with the
123
leaky integrate-and-fire (LIF) neuron model, in which the neuron fires with a certain probability. The membrane potential of (i, j)th FB neuron, VFB ij , is given by X dVijFB t ¼ VijFB þ wðkl; ij; FB; FB; tÞSkl ðtÞ d kl X wðmn; ij; FD; FB; tÞSmn ðtÞ þ Iij þ nij ðtÞ; þ
sFB
ð3Þ
mn
where sFB is the time constant, and w(kl, ij; FB, FB; t) and w(mn, ij;FD, FB; t) are the strength of the synaptic connection from the (k, l)th FB neuron to the (i, j)th one and of that from the (m, n)th FD neuron to the (i, j)th FB one, respectively. The external input is Iij. The noise signal, nij(t), is given by a random noise that has a random value in the range of (-n0, n0). The FB neuron fires with the probability, pij, given by pij ¼
1 ; 1 þ expðgðVijFB VhFB ÞÞ
ð4Þ
where g is the parameter determining the steepness of the probability function, and VFB h is the threshold value. The spike of FB neuron has the refractory period of 3 ms. The synaptic input evoked by the output of the (k, l)th FB neuron is given by the sum of the a-function, M X t tkl;m t tkl;m Skl ðtÞ ¼ exp ; ð5Þ sa sa m¼0 where sa is the time constant and tkl, m is the firing time of mth spike of the (k, l)th FB neuron, and M is the number of spikes generated by the neuron until the latest spike firing. The firing time of FD neuron is regarded as the time at which the FD neuron takes the output of 1. The parameter values used are as follows: NCFB = 10, NRFB = 9, sFB = 30 ms, g = 14, VFB h = 8 mV, sa = 1 ms, and n0 = 5 mV. We embedded two types of memories corresponding to the two words, ‘‘abc’’ and ‘‘cba,’’ in the FB network. Each memory is represented by the temporal sequence of six attractors, as shown in Fig. 2b, c. The attractor of the word ‘‘abc’’ is constructed with the six attractors: the attractors of the phonemes, /a/, /b/, and /c/, are termed A, B, and C, respectively, and those of the overlaps, /ab/, /bc/, and /ca/, are termed AB, BC, and CA, respectively. The embedded patterns of the six attractors are represented by the activity patterns of the FB network, each of which contains ten activated neurons in one column, with the activity value of 1, and the other silent neurons with the activity value of 0. The embedded patterns were chosen so that they are orthogonal with each other: the inner product of a pair of the embedded pattern vectors is equal to 0. The attractor of the word ‘‘cba,’’ as shown in Fig. 2c, has the same phoneme attractors as that of the word ‘‘abc,’’ while it has the different attractors for the overlaps, because the temporal
Cogn Comput (2014) 6:145–157
order of the phonemes is reverse for these two words. Thus, the nine embedded patterns were used to represent the attractors of the phonemes and overlaps involved in the two words. To form the attractors in the FB network, we used a supervised learning, in which the sound stimuli were applied to the FR layer concurrently with an input to the FB layer activating the corresponding attractor. Generally, vocalization makes us learn various words. The input to the FB layer, acting as a teacher signal, may be a top–down signal reflecting behavior of producing a speech sound relevant to the applied stimulus. The input to FB neurons was set at 40 mV. The synaptic weights between FD and FR neurons were fixed on the synaptic values determined by the Kohonen’s learning rule during the learning of the attractors. The synaptic weights between FB neurons were initially set at the values of nearly zero and then updated depending on the activity of FB neurons. In the learning period, the stimuli of the two words were alternatively applied to the FR network with the cycle of 3,000 ms. The learning was performed by 60 s. The attractors of the two words were formed with the spike-timing-dependent plasticity (STDP) learning shown in Fig. 2d. This learning rule depends on the timing between presynaptic and postsynaptic activities. The presynaptic spike of (k, l)th neuron occurring at time tpre and the postsynaptic spike of (i, j)th neuron at time tpost modify the synaptic weight w(kl, ij;FB, FB;t) by wðkl; ij; FB; FB; tÞ ! wðkl; ij; FB; FB; tÞ þ FðDtÞ where Dt ¼ tpost tpre . The STDP window function, FðDtÞ, is given by 8 < Aþ eDt=ss ðDt eÞ FðDtÞ ¼ ð6Þ Aþ ee=ss ðe Dt\eÞ ; : A eDt=ss ðDt\ eÞ where ss is the time constant of the exponential decay, which determines the effective time range of the STDP window function. The synaptic weights are updated by this learning rule every time a postsynaptic neuron receives a spike. The parameter values used are as follows: Aþ ¼ 0:005; A ¼ 0:005; e ¼ 0:25 ms, and ss = 30 ms. Learning of Synaptic Connections Between FD and FB Neurons In order to learn the weights of the synaptic connections from FD to FB neurons, the sound stimuli of the words, ‘‘abc’’ and ‘‘cba,’’ were sequentially and repeatedly applied to the FR network. These stimuli evoked the activation of the FD neurons detecting the features of phonemes and their overlaps. Simultaneously, the inputs corresponding to the embedded patterns were applied to the FB network in accordance with the FR activities evoked by the phonemes and overlaps of the two words. The weight of the synaptic connection from (i, j)th
149
FD neuron to (k, l)th FB neuron, w(ij, kl;FD, FB;t), is determined by the Hebbian learning, given by sw
dwðij; kl; FD; FB; tÞ ¼ wðij; kl; FD; FB; tÞ dt wðij; kl; FD; FB; tÞ þ k2 1 wm FB SFD ij ðtÞ Skl ðtÞ;
ð7Þ
where sw is the time constant and k2 is the learning rate. The synaptic weight tends toward the maximum value wm as the learning proceeds. The parameter values used are: sw =10 s, k2 = 0.001, and wm = 10. Representation of Sound Stimuli We used the two sound stimuli to the FR layer, corresponding to the words, ‘‘abc’’ and ‘‘cba,’’ as shown in Fig. 3a, c. The sound stimulus of the word ‘‘abc’’ consists of the phonemes, /a/, /b/, and /c/, and their overlaps, /ab/, /bc/, and /ca/, with the temporal order, =a= ! =ab= ! =b= ! =bc= ! =c= ! =ca= ! =a=, as shown in Fig. 3a. The letters of the word ‘‘abc’’ are abstract labels and do not represent the actual sounds. It is actually difficult to know the temporal property of the cortical activity evoked by natural word stimuli, because our model does not have models of cochlea and subcortical areas. Thus, we provide each phoneme of the word with a simple spectrotemporal property, given by a pure tone with a single frequency and a constant intensity. The magnitude of input was set at 1.0, which was the same as the value used in Yamaguchi et al. [48] model. The frequencies of the three phonemes were set to be those of i = 5, 10, and 15 on the T-axis, divided by the equal frequency, for simplicity. The time duration of the phonemes was set at 120 ms and that of the overlaps at 20ms. The sound stimuli of the overlaps were given by the co-stimulation of two neighboring phonemes. The duration of each word is about 300 ms, which is consistent with the experimental result that natural words consisting of three letters evoke the sonograms with the durations of hundreds of milliseconds [38]. The sound stimulus of the word ‘‘cba’’ has also the sound components similar to the word ‘‘abc,’’ consisting of the phonemes, /a/, /b/, and /c/, and the overlaps, /cb/, /ba/, and /ac/, as shown in Fig. 3c.
Parameter Values The parameter values for the FR network were based on those used in the network model of auditory cortex by Yamaguchi et al. [48], given in ‘‘Appendix.’’ The parameter values about the LIF model and synaptic input were chosen on the basis of cortical pyramidal neurons in the
123
150
Cogn Comput (2014) 6:145–157
(a)
(c)
/c/
/c/
/b/
/b/
/a/
/a/
/a/
bc ca
ab
100ms
(b)
20ms
120ms
200ms
250ms
ac
cb Time(ms)
20ms
/c/
ba
300ms
T
100ms
(d)
20ms
Time(ms)
20ms
120ms
200ms
250ms
300ms
T P
P
Time
Time
Fig. 3 a, c The sound stimuli to the FR network. The sound stimuli of the words ‘‘abc’’ (a) and ‘‘cba’’ (c), each of which is given by a temporal sequence of constant inputs. The input duration of the phonemes was set at 120 ms and that of the overlaps at 20 ms. The overlaps were generated by co-stimulation of adjacent two phonemes. b, d Neural responses of the FR layer to the sound stimuli of the
words ‘‘abc’’ and ‘‘cba.’’ The panels show the snapshots of the spatiotemporal activities of FR neurons at different times. The stimulus and the evoked snapshots are shown by the pair of the figures (a)–(b) and of (c)–(d). Excitatory and inhibitory waves are depicted by white- and gray-colored clusters, respectively
living animal [25]. The parameter values regarding the STDP learning were based on those used in the STDP learning by Song et al. [40].
the strength of the inhibitory synapse and is also related with the activation rate of E-neuron. Thus, the balance in the synaptic strength between excitatory and inhibitory connections enables a stable propagation of excitatory activity across the FR network.
Results Extraction of Sound Features by FD Neurons Spatiotemporal Activities of FR Neurons in Response to Sound Stimuli Figure 3b, d illustrates the activities of FR excitatory and inhibitory neurons in response to the words, ‘‘abc’’ and ‘‘cba,’’ respectively, picked out at different times. In the FR network, the stimulus of each word evoked the spatiotemporal activity of FR neurons, which contained excitatory areas followed by inhibitory areas. The stimuli of the two words evoked the different spatiotemporal activities that reflected the spectrotemporal structures of these word stimuli. The overlap between phonemes exhibited the coexistence of excitatory waves propagating along two different frequency axes (e.g., the panel at t = 120 ms in Fig. 3b). The role of excitatory connection between FR neurons is to facilitate the propagation of the excitatory wave, whereas the role of inhibition in the network is to suppress the activity of E-neurons at a relevant timing. If the inhibition suppresses the activity of E-neurons before they are brought to the firing threshold, it would prevent the propagation of the excitatory wave. Inversely, if the inhibition suppresses the activity of E-neurons with a long time delay after the activation of the E-neurons, it would prevent the localization of the excitatory wave. The relevant timing of the inhibition is determined by
123
Figure 4a, b illustrates the spatial activities of the FD neurons detecting the features of the two phonemes, /a/ of the word ‘‘abc’’ and that of the word ‘‘cba.’’ The inputs of the two phonemes to the FR network activated the FD neurons depicted with the filled circles in Fig. 4a, b. Each of the FD neurons responded to a spatial activity of FR neurons every short time period. Figure 4c, d show the spatial activities of the FD neurons detecting the features of the two overlaps, /ab/ of the word ‘‘abc’’ and /ba/ of the word ‘‘cba.’’ These figures were depicted in the similar way to Fig. 4a, b. The sound features of phonemes and overlaps are thus represented by the populations of FD neurons. Notably, the FD neurons shown in Fig. 4a, b appeared to exhibit almost the same spatial pattern in response to the phonemes /a/ of the two words. In contrast, the FD neurons shown in Fig. 4c, d appeared to show quite different patterns in response to the two overlaps. To evaluate the similarity between the spatial patterns of FD activities, we calculated the Euclidean distance between a pair of activity patterns of FD neurons, in the way that the activity patterns have the values of 1 and 0 for the activated and inactivated FD neurons, respectively. The similarity of two FD activities was evaluated with the relative distance D/Dm, where D and Dm are the Euclidean distance between the two FD
Cogn Comput (2014) 6:145–157
(a) 40
(b) 40
35
35
30
30
25
j 25
20
20
15
15
10
10
5
5
j
0
0
5
10
15
20
25
30
35
40
0
0
5
10
15
i
(d) 40
35
35
30
30
j 25
j 25
20
20
15
15
10
10
5
5
0
0
5
10
15
20
25
30
35
40
0
0
5
10
i
(e)
activities and the maximum Euclidean distance, respectively. The maximum Euclidean distance was defined as the distance in the case where the two activities have completely different patterns without changing the number of activated neurons. The result is shown in Fig. 4e. The relative distance between the FD activities shown in Fig. 4a, b, as shown in the /a/–/a/ in Fig. 4e, exhibited a small value less than 20 %, indicating that the FD activities elicited by both the phonemes are quite similar. In contrast, the relative distance of the FD activity patterns shown in Fig. 4c, d, as shown in the /ab/–/ba/ in Fig. 4e, was 100 %, indicating that both the FD activities have quite different patterns. This was also the case for the pairs of other phonemes and their overlaps, as shown in Fig. 4e. The result indicates that the phonemes involved in both words are represented by intrinsic spatial patterns of FD activity independently of the temporal order of phonemes, while the overlaps are represented by largely different patterns of
20
25
30
35
40
25
30
35
40
i
(c) 40
Relative distance (%)
Fig. 4 a–d Spatial activities of FD neurons in response to phonemes and their overlaps. The panels show the spatial activities of FD neurons in the two-dimensional array of the FD layer, which are evoked by the two phonemes and the two overlaps, /a/ of the word ‘‘abc’’ (a), /a/ of the word ‘‘cba’’ (b), /ab/ of the word ‘‘abc’’ (c), and /ba/ of the word ‘‘cba’’ (d). These patterns were depicted by overwriting FD neurons activated at different times during the stimulation of the phonemes and overlaps. e Relative distance between two spatial activities of FD neurons. The relative distance was defined with the ratio, D/Dm, where D is the Euclidean distance between spatial activities evoked by the two words and Dm is the maximum distance. The Euclidean distance was calculated using the FD activity patterns, in which activated FD neuron has the value of 1 and inactivated FD one has the value of 0. The maximum distance is the distance in the case where two patterns of FD activities have completely different patterns with the same activity. Two spatial activities come close to the same pattern of activity as the relative distance approaches the value of 0
151
15
20
i
100 80 60 40 20 0 /a/-/a/
/b/-/b/
/c/-/c/
/ab/-/ba/ /bc/-/cb/
/ca/-/ac/
FD activity, depending on the temporal order. This suggests that the extraction of the overlap features, besides the features of phonemes themselves, plays an important role in discriminating between word stimuli. As such, the function of the FD layer is to extract the salient features such as phonemes and overlaps from the spatiotemporal activity of FR neurons. Binding of Sound Features in FB Layer Figure 5a illustrates the temporal variation of the network state of the FB layer for the stimulation of the words, ‘‘abc’’ and ‘‘cba,’’ to the FR layer. The learning was made up to 60 s. The attractors representing phonemes and their overlaps and the temporal correlations among the attractors were formed by the STDP learning given by Eq. (6). Hamming distance between the network state and the embedded pattern of attractor was used to determine
123
152
Cogn Comput (2014) 6:145–157 b Fig. 5 a, b Temporal variations of the network state of the FB layer
(a) BG
abc
BG
cba
BG
BA CB
Attractors
AC BC AB CA C B A 60000 65000 70000 75000 80000 85000 90000 95000 100000
Time(ms)
(b) abc
BG BA
in response to the stimuli of the words, abc and cba. a The point was marked on the attractor in which the network state stayed. BG indicates the background state in which only noisy signals are applied to the FR layer. When the Hamming distance between the network state and the embedded pattern of attractor was within the error of 20 %, the network state was regarded as staying in the attractor. b Expanded plot of the upper panel before and during the stimulation of the word abc. The stimulus allows the network state to stay in the attractors representing the word abc, exhibiting the temporal order, A ! AB ! B ! BC ! C ! CA, seen in the time period shown in the mark P. The temporal order of the attractors reflects the temporal order of phonemes and overlaps involved in the word abc. c Time courses of synaptic weights during the learning of the words abc and cba. The labels ‘‘X ! Y’’ (X = A, AB; Y = A, B, AB, BA) denote the synaptic weights, each is the weight of the synaptic connection from an activated FB neuron in the attractor X to that in the attractor Y. The temporal orders, A ! AB, and AB ! B, consistent with the temporal orders of the phonemes and overlaps in the word abc, facilitate the increased synaptic weights, while the irrelevant temporal orders, A ! BA, and AB ! A, suppress them
CB
Attractors
AC
P
BC AB CA C B A 69000
69500
70000
70500
71000
71500
72000
Time(ms)
(c)
1 0.5
AB
B
Synaptic weight
0
A
-0.5
AB
-1 -1.5
AB
A
-2
A
-2.5 -3
0
10000
20000
30000
BA 40000
50000
60000
Time(ms)
whether the network state stayed in the attractor. In calculating the Hamming distance, we used the pattern of the network state in which a neuron had the value of 1 when it fired and the value of 0 when it was silent. The network state was regarded as staying in the attractor when the Hamming distance was within the error of 20 %. The information about phonemes and their overlaps was combined into a temporal sequence of attractors, enabling the FB network to represent the two word information. In the
123
background state (‘‘BG’’ in Fig. 5a) where the FR network did not receive any sound signal but only weak noise signals, the network state exhibited an itinerant state between the attractors of the two words. In contrast, the stimulation of the two words to the FR network allowed the FB network state to stay in the attractors of these words, each consisting of the attractors shown in Fig. 2b, c. The transition to the attractors indicates the feature binding of the word objects, which bears on important aspects of word perception. After the stimulation of the word ‘‘abc’’ or ‘‘cba,’’ the network state was recovered to the background state. Figure 5b shows the magnification of the network state shown in Fig. 5a before and during the stimulation of the word ‘‘abc.’’ The stimulus of the word ‘‘abc’’ changed the network state from the itinerant state to the attractor state representing the word. As seen in the time period shown by the mark ‘‘P,’’ the attractor of the word ‘‘abc’’ had the temporal sequence, A ! AB ! B ! BC ! C ! CA, reflecting the temporal order of the phonemes and their overlap of the word ‘‘abc.’’ Similarly, the stimulation of the word ‘‘cba’’ evoked a temporal sequence of attractors, whose temporal order was consistent with that of the phonemes and overlaps in the word ‘‘cba’’ (not shown here). Furthermore, the attractors of the phonemes were more stable than those of the overlaps, as seen in the difference in the time period during which the network state stayed in each attractor, shown in Fig. 5b. The difference in the stability comes from the difference in the learning condition of the phonemes and overlaps in both words: the phonemes are learned more frequently than the overlaps are done, because the two words have the phonemes in common. Figure 5c shows the temporal variations of the synaptic weights of the FB neurons that encode the phonemes, /a/
Cogn Comput (2014) 6:145–157
153
(a)
(b) BA
BG
CB C
AC BC
Attractors
Attractors
abc (-)
AB CA C B
B
A
A 70000
71000
72000
73000
74000
75000
76000
69000 69500 70000 70500 71000 71500 72000
Time(ms)
Time(ms)
(c)
(d)
1
1.6
0.5
1.4
-0.5
Threshold input
Synaptic weight
0
A B
-1
B A
-1.5 -2
1 0.8 0.6 0.4
-2.5 -3
1.2
0.2 0
10000
20000
30000
40000
50000
60000
0
abc
abc (-)
Time(ms) Fig. 6 a Temporal variation of FB network state during the stimulation of an incomplete word. The incomplete word lacks the phoneme /c/ and the overlap /ca/ in the word abc. The network state was depicted by the similar way to that of Fig. 5a. b–d Effect of overlaps of phonemes on attractor representation. The FB network was learned with the word stimuli, each lacking the overlaps involved in the words abc and cba, termed abc(-) and cba(-), respectively. b Temporal variation of FB
network state before and during the stimulation of the word abc(-). c Time courses of synaptic weights during the learning of the stimuli abc(-) and cba(-). The labels ‘‘X ! Y’’ (X = A, B; Y = B, A) are denoted by the similar way to those in Fig. 5c. d Relative threshold inputs required in the transition between attractors. The relative inputs to the FR layer are represented by the ratio of the threshold input for the words ‘‘X’’ (X = abc, abc(-)) to that for the word abc
and /b/, and their overlaps, /ab/ and /ba/, during the learning of the words, ‘‘abc’’ and ‘‘cba.’’ The learning of the word ‘‘abc’’ allows the increased weights of the synaptic connections from the FB neuron encoding the sound /X/ to that encoding the sound /Y/ ((X, Y) = (a, ab), (ab, b)), because the temporal orders of attractors are consistent with those of the phonemes and overlaps involved in the word ‘‘abc,’’ as shown in Fig. 2b. In contrast, the synaptic weights for (X, Y) = (a, ba) and (ab, a) are decreased because of the irrelevant temporal orders to the word ‘‘abc.’’ Thus the learning increases only the synaptic weights of the FB neurons that are activated in the temporal orders of the phonemes and overlaps in the two words. Figure 6a shows the response of the FB network to an incomplete word stimulus lacking the parts of the phoneme and overlap in the word ‘‘abc.’’ The stimulus lacked the phoneme /c/ and the overlap /ca/. Despite of no input signal of the phonetic parts, the network state evoked by the incomplete word stimulus partially stayed in the attractors C and CA, corresponding to the attractors of the lacking
phoneme and overlap, although the frequencies staying in these attractors were low compared with those staying in the other attractors. This indicates that the FB network is capable of complementing the information of the lacking phoneme and overlap. The complementation ability is due to the temporal linkage of attractors. The sound representation by attractors enables the prediction of the temporal correlations between phonemes and may play a crucial role in robust perception of various sounds. A Role of Overlaps in Temporal Binding of Phoneme Attractors in FB Layer In order to investigate a role of overlaps in temporal binding of phoneme attractors in the FB layer, we examined the dynamic property of attractors representing words without overlaps. Figure 6b illustrates the temporal variation of the network state of the FB layer after the learning of the words lacking the overlaps between phonemes involved in the two words ‘‘abc’’ and ‘‘cba.’’ The words
123
154
without the overlaps of the two words are termed ‘‘abc(-)’’ and ‘‘cba(-),’’ respectively. The attractors did not appear in the background state because the network could not form the overlap attractors that linked temporally the phoneme attractors. The synaptic weights between FB neurons exhibited no significant changes during the learning of the words without overlaps, as shown in Fig. 6c This is due to the STDP learning. The learning of the word ‘‘abc(-)’’ increases the synaptic weights relevant to the temporal correlations of phonemes, =a= ! =b=, while the learning of the word ‘‘cba(-)’’ decreases them because of the reversed correlation, =b= ! =a=. Both changes in the synaptic weights are cancelled with each other, resulting in no significant change in these synapses. Similarly, the cancelation occurs also in the synaptic changes elicited by the temporal orders of other phonemes, ð=b= ! =c=; =c= ! =b=Þ and ð=c= ! =a=; =a= ! =c=Þ. Consequently, the lacking of the overlaps cannot strongly link the attractors of the phonemes. This requires a larger threshold input to the FR layer for eliciting the transition of one attractor to other one in the FB layer. The threshold is about 1.6 times larger than the threshold input for the learning of the normal words, ‘‘abc’’ and ‘‘cba,’’ as shown in Fig. 6d. Thus, the encoding of overlaps strengthens the temporal correlations of phonemes and enables the efficient linkage of phoneme attractors in order to generate word information.
Discussion We have shown a network model for hierarchical information processing in auditory cortex, which consists of the three layers, FR, FD, and FB layers. These layers encode the information about different aspects of auditory features along the pathway from the FR layer to the FB layer via the FD layer, enabling the system to decode the word information in the FB layer. The model of the hierarchical auditory processing provides a useful insight into understanding how word information emerges from spatiotemporal activity of cortical neurons. Different approaches for encoding and decoding temporal information of stimulus have been made so far, as seen in the studies of liquid-state machines [30], echo-state machines [22], and state-dependent networks [5, 31]. These approaches have also demonstrated that temporal information of stimulus is encoded into a spatiotemporal activity of neurons in encoding stage and the spatiotemporal activity is decoded by a read-out neuron or read-out neurons with a synaptic learning [2, 21, 24, 28, 32, 34, 41]. It remains, however, unclear how sensory information is decoded from temporal inputs, because inputs corresponding to sensory stimuli have not been considered in most studies of these networks. Our model argues that in
123
Cogn Comput (2014) 6:145–157
the FD layer, the stimulus features of phonemes and their overlaps, having the physical meanings as feature elements of sound stimulus, are extracted from spatiotemporal activity of the FR layer, and these feature elements are then combined into a temporal sequence of attractors in order to represent word information of the sound stimulus. The decoding strategy may provide a clue for understanding how information about auditory objects is constructed in auditory cortex. It may also help understand the neural mechanisms of other sensory coding such as olfactory [27] and somatosensory coding [1], in which spatiotemporal activities of sensory neurons appear, as well as auditory system. There are several models for speech perception. The TRACE II model [33] consists of a very large number of units, organized into three levels, the feature, phoneme, and word level. The tempo model [19] extracts lexical features with different time scales using nested oscillations of brain rhythms. The model of lexical access [42] is made based on the extraction of a sequence of feature bundles from acoustic cues and the matching of the feature bundles against the items of a lexicon. Each of these models has a hierarchical processing by which word features are extracted from a sound signal and then combined in order to represent a word, which is the common strategy to our model. The critical difference between our model and the existing models is the extraction of phoneme overlaps and the attractor representation of them. The extraction of the overlaps is responsible for the temporal link of phoneme attractors and enables our model to represent efficiently the words sharing the same phonemes with a different order. In contrast, the existing models do not show how words partially sharing common phonemes are temporally linked. Our model also offers a concrete mechanism by which word information is processed in the neural circuits involved in word perception, providing an insight into understanding the neural mechanism underlying word perception. On the other hand, the limitation of our model is that the number of words that can be represented with our model is limited due to the complex dynamics of attractors in the FB layer. In contrast, the existing models are capable of processing a large number of words because of the artificial processing of words. A memory system storing a lexicon, added to our model as a upper layer of the FB layer, might improve the processing ability of words in our model, because it may control the attractor dynamics of the FB network via the interaction between the memory system and the FB network. We have proposed the neural mechanism by which word information is decoded by a hierarchical processing. Hierarchical processing is well known in visual systems. Visual objects [36, 43] and face objects [13] are represented by hierarchical processing in areas beyond the primary visual
Cogn Comput (2014) 6:145–157
area. With respect to auditory perception, there is evidence that sound stimuli are hierarchically encoded. Recent study by functional magnetic imaging demonstrated that preference for complex sound emerged in the human auditory cortex in a hierarchical manner [10]. Recent study by direct electrode recordings also showed that the neural response patterns of posterior superior temporal gyrus were strongly organized along phonic categories [6]. These studies suggest that several regions beyond A1 could encode abstract, linguistic information in speech. Our model may help understand the neural mechanism underlying the representation of auditory objects in auditory cortex. Our model points to the importance of the overlap features in determining the temporal order of phonemes. The cues of successive units of speech frequently overlap in time. There are no separable packets of information in the spectrogram like the separate feature bundles that make up letters in printed words. Psychoacoustic studies demonstrated the importance of phoneme overlaps in word perception [14, 29]. Effect of phoneme overlaps on speech perception is also seen in a classic psychoacoustic phenomenon, called ‘‘auditory stream segregation’’ [3, 12]. In this phenomenon, a sequence of a high- and a low-frequency tones alternating in a pattern ‘‘ABAB’’ is presented. The sound perception of subjects depends on the presentation rate of the tones, suggesting that the overlaps between the tones strongly influence the sound perception. Furthermore, several models of speech perception have units for representing a certain feature corresponding to phoneme overlap. The TRACE II model [33] involves the units detecting phoneme overlaps in phoneme layer, which are one copy of each phoneme detector shifted by a certain time step. The tempo model [19] has a network for detecting an ‘‘acoustic edge,’’ or a boundary between phones. Further experimental studies are needed to investigate the neural coding of the overlaps in auditory cortex. We have accounted for the problem of how the two words ‘‘abc’’ and ‘‘cba,’’ sharing the same phonemes with a different order, are represented in the auditory cortex. Our model can also recognize rightly other word stimuli, for example, words sharing a phoneme sequence. We consider here the learning of the words ‘‘abc’’ and ‘‘dabce,’’ sharing the same sequence ‘‘abc.’’ The learning of the word ‘‘dabce’’ would result in the formation of the attractors of the phonemes, /d/ and /e/, and those of the overlaps, /da/, /ce/, and /ed/, in addition to the attractors of the word ‘‘abc.’’ The overlap attractors, linking the phoneme attractors, can determine to which phoneme attractor the FB network state should go, even though the two words share the same phoneme sequence. Furthermore, the formation of the attractors representing the complete word ‘‘dabce’’ would also exhibit a complement effect in the response of the FB network to the incomplete word ‘‘dabc’’ missing the letter ‘‘e.’’ The
155
complement effect comes from the dynamic property of the background state in the FB network, in which the attractors of the phonemes and overlaps involved in the complete word are linked in the relevant temporal oder. In our model, we assumed, for simplicity, that each of the phonemes, /a/, /b/, and /c/, was pure tone with a single frequency and a constant intensity. It is a substantial simplifying assumption for the inputs to the FR network. Natural sounds such as speech would evoke complex spatiotemporal activities of FR neurons, because they are composed of multiple frequency components whose intensities vary temporally. Even if the spatiotemporal activity of the FR layer comes to be more complex, the decoding mechanism of auditory information, presented here, would not change. This is because any complex spatiotemporal activity of FR neurons is reduced to a temporal sequence of spatial patterns in order to extract the sound features in the FD layer. FD neurons sensitive to the reduced spatial patterns can be easily formed with Kohonen’s learning rule. As a result, FD neurons are able to extract the sound features from any complex FR activity. Once FD neurons can extract the sound features, the FB layer is capable of combing them as dynamical attractors. Therefore, using more complex inputs such as natural sounds would not change the essential roles of the FD and FB layers in the information processing of sound stimulus. In order to form the attractors of phonemes and their overlaps in the FB network, we adopted the STDP learning rule. STDP learning-like rules has been observed in a variety of species [18] throughout the auditory pathway, including brainstem [47] and cortical areas [37]. STDP in dorsal cochlear nucleus appears to follow Hebbian and anti-Hebbian patterns in a cell-specific manner [47]. Several studies have supported for STDP in auditory cortex [7, 23]. Hence, it is likely that auditory system exploits STDP for processing sensory and memory information. Our model was constructed with the feedforwardly connected three networks and did not include feedback connections between these networks. Virtually all sensory systems have both feedforward and feedback pathways. However, as the first step toward understanding the mechanism of auditory processing, it is needed to understand how auditory information is processed feedforwardly in auditory system. The feedforward model provides a basis for studying feedback processing of auditory information. Recent study in visual system has claimed the importance of predictive coding generated by the information of both directions [35]. Feedback in auditory systems may play an important role in the cocktail party problem [4]. Further study is needed to investigate the mechanism by which more efficient processing of word information is achieved by the information processing of both directions.
123
156
Cogn Comput (2014) 6:145–157
Appendix: Model of FR Layer As the model of the FR layer, we used a simple network model of wave propagation in the auditory cortex, proposed by Yamaguchi et al. [48]. The network model of the FR layer consists of a two-dimensional array of population units as shown in Fig. 1b. The unit consists of a pair of excitatory and inhibitory neuron as shown in Fig. 1c. The network model has the two axes, one is the tonotopical axis representing frequency selectivity of input, or T-axis, and the other is propagation axis, or P-axis, along which excitatory wave is propagated. Each population unit is numbered by ij, where i and j, respectively, denote the position on T- and P-axis (i = 1 - NT and j = 1 - NP). The time evolution of the excitatory and inhibitory neurons in (i, j)th unit is given by the following equations: sE
dxij ¼ xij aPðyij Þ þ jPðxij Þ þ C dt X þ wijmn Pðxmn Þ þ dj Ii Hij ;
ð8Þ
mn
sI
dyij ¼ yij þ bPðxij Þ b0 ; dt
ð9Þ
where xij and yij are the internal variables of excitatory and inhibitory neuron in (i, j)th unit, sE and sI are the time constants, and a; b; j; C, and b0 are constant parameters. The output function P is given by 1 PðzÞ ¼ ðtanhðkz ðz lz ÞÞ þ 1Þ; 2
ð10Þ
where kz is the gradient parameter, and lz is the threshold value. The external input to jth neurons on the ith tonotopy axis is given by djIi. The factor dj equals 1 if j = 1, and 0 otherwise, and the term Ii is the magnitude of the input to ith neuron along T-axis. The synaptic weight from (m, n)th neuron to (i, j)th neuron in the FR layer, wijmn is given by wijmn ¼ W0 ðIi þ Im Þ þ w0 ; for i 1 m i þ 1 and n ¼ j 1;
ð11Þ
where w0 gives the synaptic weight in the absence of external input. The inputs Ii and Im increase the synaptic weights of the connections between ith and mth frequency bands, and W0 is a constant parameter. Thus, neurons in the ith frequency band are characterized by the common input Ii. It gives the excitatory inputs at the left edge of P-axis, while neurons in other columns receive no direct input but input elicited by instantaneous synaptic modulation. The last term of the righthand side of Eq. (8) indicates the lateral inhibitions between neighboring neurons, given by X Hij ¼ hPðxqj Þ: ð12Þ q2Q
123
The range of competition is represented by Q = {q| q = i - 1, i ? 1}, and h is the magnitudes of the connections. The parameter values used are as follows: sE ¼ 30 ms; sI ¼ 30 ms; a ¼ 8:0; b ¼ 8:0; b0 ¼ 1:0; j ¼ 4:0; C ¼ 1:6; h ¼ 0:025; W0 ¼ 2:0; w0 ¼ 0:1; kz ¼ 9:0; lz ¼ 2:0; Ii ¼ 1:0; Im ¼ 1:0; NP ¼ 50, and NT = 20.
References 1. Ahissar E, Zacksenhouse M. Temporal and spatial coding in the rat vibrissal system. Prog Brain Res. 2001;130:75–87. 2. Barak O, Tsodyks M. Recognition by variance: learning rules for spatiotemporal patterns. Neural Comput. 2006;18:2343–58. 3. Bregman AS, Campbell J. Primary auditory stream segregation and perception of order in rapid sequences of tones. J Exp Psychol. 1971;89:244–9. 4. Bregman AS. Auditory scene analysis: the perceptual organization of sound. Cambridge: A Bradford Book; 1994. 5. Buonomano DV, Merzenich MM. Temporal information transformed into a spatial code by a neural network with realistic properties. Science. 1995;267:1028–30. 6. Chang EF, Rieger JW, Johnson K, Berger MS, Barbaro NM, Knight RT. Categorical speech representation in human superior temporal gyrus. Nat Neurosci. 2010;13:1428–32. 7. Dahmen JC, Hartley DEH, King AJ. Stimulus-timing-dependent plasticity of cortical frequency representation. J Neurosci. 2008; 28:13629–39. 8. DeAngelis GC, Ohzawa I, Freeman RD. Spatiotemporal organization of simple-cell receptive fields in the cat’s striate cortex II linearity of temporal and spatial summation. J Neurophysiol. 1993;69:1118–35. 9. deCharms R, Blake D, Merzenich M. Optimizing sound features for cortical neurons. Science. 1998;280:1439–43. 10. DeWitt I, Rauschecker JP. Phoneme and word recognition in the auditory ventral stream. Proc Natl Acad Sci USA. 2012;109: E505–14. 11. Drullman R. Temporal envelope and fine structure cues for speech intelligibility. J Acoust Soc Am. 1995;97:585–92. 12. Fishman YI, Resera DH, Arezzoa JC, Steinschneidera M. Neural correlates of auditory stream segregation in primary auditory cortex of the awake monkey. Hear Res. 2001;151:167–87. 13. Freiwald WA, Tsao DY. Functional compartmentalization and viewpoint generalization within the macaque face-processing system. Science. 2010;330:845–51. 14. Fowler CA. Segmentation of coarticulated speech in perception. Percept Psychophys. 1984;36:359–68. 15. Fujita K, Kashimori Y, Kambara T. Spatiotemporal burst coding for extracting features of spatiotemporally varying stimuli. Biol Cybern. 2007;97:293–305. 16. Fukunishi K, Murai N, Uno H. Dynamic characteristics of the auditory cortex of guinea pigs observed with multichannel optical recording. Biol Cybern. 1992;67:501–9. 17. Fukunishi K, Murai N. Temporal coding in the guinea-pig auditory cortex as revealed by optical imaging and its pattern-timeseries analysis. Biol Cybern. 1995;72:463–73. 18. Gerstner W, Kempter R, Hemmen JL, Wagner H. A neuronal learning rule for sub-millisecond temporal coding. Nature. 1996;383:76–81. 19. Ghitza O. Linking speech perception and neurophysiology: speech decoding guided by cascaded oscillators locked to the input rhythm. Front Psychol. 2011;2:130.
Cogn Comput (2014) 6:145–157 20. Horikawa J, Hosokawa Y, Kubota M, Nasu M, Taniguchi I. Optical imaging of spatiotemporal patterns of glutamatergic excitation and GABAergic inhibition in the guinea-pig auditory cortex in vivo. J Physiol. 1996;497:629–38. 21. Izhikevich EM. Solving the distal reward problem through linkage of STDP and dopamine signaling. Cereb Cortex. 2007;17:2443–52. 22. Jaeger H, Haas H. Harnessing nonlinearity: predicting chaotic systems and saving energy in wireless communication. Science. 2004;304:78–80. 23. Karmarkar UR, Najarian MT, Buonomano DV. Mechanisms and significance of spike-timing dependent plasticity. Biol Cybern. 2002;87:373–382. 24. Knu¨sel P, Wyss R, Ko¨nig P, Verschure PFMJ. Decoding a temporal population code. Neural Comput. 2004;16:2079–2100. 25. Koch C. Biophysics of computation: information processing in single neurons (computational neuroscience). 1st ed. Oxford: Oxford University Press; 1998. 26. Kohonen T. Self-organizing maps. 3rd ed. Berlin: Springer; 2001. 27. Laurent G. Olfactory network dynamics and the coding of multidimensional signals. Nat Rev Neurosci. 2002;3:884–95. 28. Legenstein R, Pecevski D, Maass W. A learning theory for reward-modulated spike-timing-dependent plasticity with application to biofeedback. PLoS Comput Biol. 2008;4:e1000180. 29. Liberman AM. The grammars of speech and language. Cogn Psychol. 1970;1:301–23. 30. Maass W, Natschla¨ger T, Markram H. Real-time computing without stable states: a new framework for neural computation based on perturbations. Neural Comput. 2002;14:2531–60. 31. Mauk MD, Buonomano DV. The neural basis of temporal processing. Annu Rev Neurosci. 2004;27:307–40. 32. Mazor O, Laurent G. Transient dynamics versus fixed points in odor representations by locust antennal lobe projection neurons. Neuron. 2005;48:661–73. 33. McClelland JL, Elman JL. The TRACE model of speech perception. Cogn Psychol. 1986;18:1–86. 34. Nikolic D, Ha¨usler S, Singer W, Maass W. Temporal dynamics of information content carried by neurons in the primary visual cortex. In: Scho¨lkopf B, Platt JC, Hoffman T, editors. Advances in neural information processing systems, vol 19. Vancouver: MIT Press; 2007. p.1041–8.
157 35. Rao RPN, Ballard DH. Predictive coding in the visual cortex: a functional interpretation of some extra-classical receptive-field effects. Nat Neurosci. 1999;2:79–87. 36. Riesenhuber M, Poggio T. Models of object recognition. Nat Neurosci. 1999;3:1199–203. 37. Schnupp JWH, Hall TM, Kokelaar RF, Ahmed B. Plasticity of temporal pattern codes for vocalization stimuli in primary auditory cortex. J Neurosci. 2006;26:4785–95. 38. Schnupp J, Nelken I, King A. Auditory neuroscience: making sense of sound. Cambridge: The MIT Press; 2011. 39. Shannon RV, Zeng F-G, Kamath V, Wygonski J, Ekelid M. Speech recognition with primarily temporal cues. Science. 1995;270:303–4. 40. Song S, Miller KD, Abbott LF. Competitive Hebbian learning through spike-timing-dependent synaptic plasticity. Nat Neurosci. 2000;3:919–26. 41. Sprekeler H, Michaelis C, Wiskott L. Slowness: an objective for spike-timing-dependent plasticity? PLoS Comput Biol. 2007; 3:e112. 42. Stevens KN. Toward a model for lexical access based on acoustic landmarks and distinctive features. J Acoust Soc Am. 2002;111: 1872–91. 43. Tanaka K. Inferotemporal cortex and object vision. Annu Rev Neurosci. 1996;19:109–39. 44. Taniguchi I, Horikawa J, Moriyama T, Nasu M. Spatio-temporal pattern of frequency representation in the auditory cortex of guinea pigs. Neurosci Lett. 1992;146:37–40. 45. Taniguchi I, Nasu M. Spatio-temporal representation of sound intensity in the guinea pig auditory cortex observed by optical recording. Neurosci Lett. 1993;151:178–81. 46. Theunissen FE, Sen K, Doupe AJ. Spectral-temporal receptive fields of nonlinear auditory neurons obtained using natural sounds. J Neurosci. 2000;20:2315–31. 47. Tzounopoulos T, Kim Y, Oertel D, Trussell LO. Cell-specific, spike timing-dependent plasticities in the dorsal cochlear nucleus. Nat Neurosci. 2004;7:719–25. 48. Yamaguchi Y, Horikawa J, Taniguchi I. Neural dynamics of vocal processing in the auditory cortex. In: Poznanski RR, editor. Biophysical neural networks. New York: Mary Ann Liebert; 2001. p. 343–62.
123