Neurons And Language: A Response to Poeppel and Embick
Some Preliminary Thoughts on Roles for Individual Neurons in Language
Jo Edwards 20.11.2013
Poeppel and Embick (2005) pose two challenges for analysis of language in neurobiological terms: that of matching levels of functional grain and that of ontological commensurability. I propose here that the individual neuron, a complex dynamic connection between combinatorial input possibilities and numerous output ramifications, is the crucial level of grain for heuristic purposes. I argue that in the absence of true signal strings in the brain all neuronal output signals must have stand-alone predicative meaning. Individual neurons with reinforced feedback connections are patterns of dynamic disposition that can provide the basis for grasped concepts. Grammatical constructs can then be seen, ontologically, as instances of physical patterns (sounds, gestures or marks), perceived and produced through more or less standard sensory and motor resources, that lead to sequences of predicative signaling events. Internally, grammar is then a matter of regularities in instances of operation of cellular dispositions in the form of signal sending and receiving rather than of static patterns. Typically, grammatical constructs entrain signaling events that lead to combination in terms of co-arrival at the dendritic trees of (each of many) individual neurons, giving rise to scenarios or ideas. The dynamic rules for this signal sending and combining may not have changed much in the acquisition of language. Animals without language can use these events in a complex hierarchy to generate at least some ideas of the sort that we describe using linguistic propositions or predications and the animal can judge the truth of the inferences they entail. However, animals other than Homo sapiens cannot make use of external grammatical constructs (speech or text) to generate ideas because this can only be motivated in the context of an ability to grasp ‘event’ or ‘episode’ concepts derived from a process of sensory differentiation purely in terms of repeating sequence, apparently unique to humans. This faculty for differentiation in repeating sequence may allow language for two separate reasons. Firstly, without other-than-now event concepts to refer to, and no reinforcement of association of a word with a contextual relation within an alternative scenario, the default is to reinforce association with a single object or action in the present, since context is already apparent. Secondly, the allocation of concept status to events allows language to be treated as a sequence of sign events rather than (as for birdsong) one extended sign. The suggestion is made that human language reflects a change in cellular connection patterns that allows high-level sensory material to be differentiated so that it can be allocated to one of many successive points in a sequence (not just a binary before or after status) with no required link to a spatial position. The likely mechanism may involve types of pathway in hippocampus and entorhinal cortex previously used at a lower level for spatial navigation and chaining of associated motor routines. This change can help to explain not only language but also the concept distinction between ‘mental’ and ‘physical’, the sense of self, and secondary to that, theory of mind, and mathematical and musical abilities.
Neuronal outputs have predicative meaning
A search for a neurological explanation for a faculty of language must ultimately explain what it is that carries meaning in a brain. The unit of signaling currency in a brain is the action potential or spike output of an individual neuron. The significance of these spikes is often considered to lie in ensembles or patterns. Yet, there is extensive evidence indicating that individual spikes, from primary sensory neuron to temporal lobe, can give a reliable indication of a sensory stimulus. The aim of this text is to take seriously the idea that individual spikes must be the carriers of meaning in the brain and to explore the implications for language. I will consider some further reasons for taking this approach in due course but, rather than give an exhaustive defense, I will focus on examining its explanatory potential.
Human natural languages and computer languages use strings within which there are elements with different syntactic roles co-defined for producers and receivers. Elements can encode start, stop, data or command and positions in strings can be assigned roles by rule. However (with arguable exceptions in early auditory pathways) it is likely that as a rule the brain does not use strings. It is neither serial nor parallel, but employs constant divergence and convergence of signals, with no globally fixed timing frames. I do not think strings can operate in this architecture. Nor does there appear to be any plausible means in this context of ‘distributing symbols’ amongst simultaneous signals so that the parts do not themselves have meaning – in the sense that words can be broken up into meaningless letters (distinct from variable redundancy or specificity). This implies that individual signals must have stand-alone meaning.
Even basic sensory knowledge requires complex inferences from inputs under different conditions (of eye or body movement for instance) and inference requires some form of logic, which needs premises, not just concepts. So individual signals must have stand-alone propositional or predicative meaning. Without sentential structure, or a pragmatic context that substitutes for this (as where ‘yes’ means that the proposition raised by a question is true), words have no informational content. Within the brain, the equivalent of pragmatic context for otherwise identical signals comes from the spatiotemporal relation of the signal to other events in the brain and sense organs (and, indirectly, the world).
The local propositional content of a cell output spike can be contingent in a complex way on a range of concurrent brain events - the brain may be, for instance, effectively ‘in a state of asking who is x’. This might seem to get round needing a propositional meaning for the spike itself but I do not think it does. Thus, a putative Jennifer Aniston cell spike will not mean ‘JA’ but ‘JA plays role x’. In fact, I think there are reasons for thinking that the system may at minimum have two cells, encoding ‘JA plays role x’ and ‘JA plays role y’. This may not be essential but what will be essential is that, perhaps by temporary gating, spikes can be made to indicate JA in at least two roles (as a minimum something like agent and patient, and in other cases, feature and ground or ‘this’ and ‘that’). A wider range of roles may be specifiable directly or by reassigning x and y in subroutines.
This predicative nature of spikes is needed to solve von der Malsburg’s (1981) feature binding problem for object perception. We can have no computational account of binding ‘triangle’ with ‘top’ and ‘square’ with ‘bottom’ unless we start with ‘triangle is x’ : ‘square is y’ : ‘x is at top’ : ‘y is at bottom’ and then ‘infer’ that triangle is at the top. We have no causal account of deriving a new semantic value from prior values simply by putting concepts in a row. Predicative meaning in this sense is essential to basic non-linguistic sensory processing. This may come as a surprise but I think it is implicit in the effects in visual perception studied by people like Purves and Lotto (2003). The ‘rawest’ of our perceptions are heavily modulated by interpretation as features of inferred objects.
It is debated whether the brain uses ‘symbols’. In the predicative role suggested, spikes cannot be symbols in the sense used for elements of language strings. Nevertheless, they might be considered symbols in a modified sense applicable in an internal context, in that they would have interpretable meaning and relate combinatorially and systematically where they co-arrive as inputs.
An implication of the above is that at each subsequent feedforward step each spike ‘makes a statement’ about the subset of possible ways the world might be but, importantly, the subsetting options, or ‘propositional space’ will change with each step. In a crude analogy, all the ways the world might be can be considered a ‘sheet of spacetime paper’ covered in Venn diagram circles. The first propositional space is then the space of all circles. The second is the space of all primary intersections, the third, of all secondary intersections. You can still have a code that covers the whole sheet but in a completely different language. Since spacetime is four-dimensional and time is directional (implying non-commutative relations) options for propositional spaces become rich very quickly. So we may start with all the ways the world could be described by each of a million optic nerve cells and move through ways it can be described in basic patterns like lines and shapes, to the ways it can be described using perhaps 50,000 word-like concepts.
I think it is fair to assume that most cortical neurons compute what is something like an intersection, or perhaps differential, from input signals using something akin to syllogistic logic. Two or more inputs generate an output. That might suggest that each step entails ‘pruning’ of the number of signals. However, for any set of n ‘premises’ there are a large number of ways you can generate a ‘conclusion’ if the premises are sent via branches to several sites, as they are in a brain – maybe at least n-shriek. Some of these will be the sorts of invariance-related conclusions Horace Barlow (2009) has highlighted. A human brain wants the option to draw a very wide range of conclusions, relevant to different survival situations, so it has more cells higher up, not less. But the brain also wants to be able to devote highest-level syllogisms to making the inferences from lower level data most relevant to the current situation. So it wants ways of filtering and triaging what gets through each level.
The role of synchrony
Spike synchronization may be seen as a technical issue, but I think it merits discussion because it provides a means of explaining at the individual cell level, how concepts used in language are differentiated from a mass of incoming data.
In a computer gate there is arguably only one fixed ‘inference’ from any one integrative event because the events always consists of two premises (each 0 or 1) and a fixed rule about what conclusion is to be drawn from each combination (0 or 1). In the brain, integrative events occur in dendritic trees and there may be 50,000 input channels. This means that a conclusion can be drawn from any of a very large number of combinations of premises. Moreover, if response parameters are variable then the number of premises required to trigger a conclusion can vary widely. The system is under pressure to respond as fast as possible with the most relevant conclusions possible. The brain is then in a situation somewhat like a ‘University Challenge’ game contestant. The contestant has to judge from an ongoing string of statements from the quizmaster what type of inference is to be made and answer immediately they think it is clear what inference is wanted. To compete successfully they need to risk answering before a critical premise has been given. The analogy is crude but the key point is that the set of propositions encoded in signals passed along to the next level of analysis may be critically dependent on the time window of post-synaptic integration leading to firing,
Recent work (e.g. Tiesinga and Sejnowski (2009)) suggests that synchronization of rhythmic upstream cell firing with receptivity/refractoriness of receiving cells may be exploited by the brain in at least two ways – to select which inferences are most relevant to current needs and also to triage data into separate roles to allow more sophisticated inferences to be made. The role of synchrony has been contentious but I think largely because it has often been couched in ways that are hard to interpret in causal or computational terms. Thus synchrony has been suggested as ‘binding’ or ‘integrating’ spikes or making a pattern ‘rise above noise’ or even to ‘shine a spotlight on a representation’ or confer some phenomenal qualities to a spike pattern. What more recent analysis indicates is that it is not the synchronicity of spikes per se that achieves computational objectives, such as selection of salient material and von der Malsburg’s ‘feature binding’, but the matching of spike timing to the receptivity of receiving cells.
As an illustration, it may be that receiving cells will fire once they have received a small number of inputs – say 20. Moreover, once a small number of cells, perhaps 1%, have fired, these may send inhibitory signals to other cells in the same bank that ‘write off’ their inputs and induce a refractory period. Upstream cells are then more likely to have their outputs contribute to the next level of firing either if they fire at a high rate, or if they are firing cyclically at just the right time after each refractory period to be one of the 20 that count because they are synchronized to target cells. In this setting synchrony does not confer any qualitative properties, it merely increases the statistical likelihood of certain inferences being drawn from certain premises. Since synchronies may be set up both with upstream and downstream cells it is perhaps not surprising that synchrony does not correlate perfectly with functional analyses. Nevertheless, if the active time window is narrow the effect may be potent, acting like selecting a narrow depth of field on a camera by opening aperture.
On this basis, cells sending signals encoding simple perceptual elements will compete for triggering higher concepts. You may synchronise ‘horse recognition’ cells to inputs from visual line receptors when trying to decide whether an animal on a dark night is a horse or a cow. You may sense you are not quite getting the detail of grey contours. But if a neighing sound reaches the auditory system a rapid burst of firing can compete successfully and you sense it is a horse and the grey contours slip from attention (as an image focused on apple blossom can allow a glint of sunlight to show through from behind).
Such a system could work initially like an object-triggered automatic focus in the sense that the synchrony might be tripped by lower level cells firing rapidly in response to a new feature. Later, top down influence would stabilise this or shift to something more like a manual setting, perhaps deliberately listening when vision had failed to solve the identification, which might involve either a decoupling of synchrony, an imposition of a different rate on cyclical responsiveness or even a shift of phase.
The possibility that the brain might make active use of phase shifts in relation to synchrony has been raised by observations on the hippocampus where there is evidence of synchronization of spikes at one rate of cycling that precesses in relation to a different rhythmic cycle. This raises the possibility that signals can be ‘triaged’ according to an imposed phase relation. Spikes propagating along axons with branches to all cells in the bank could, if divided into n groups each offset in phase, selectively contribute to activation of one of n sub-banks if these were also offset in phase of responsiveness. Signals from, for instance, different parts of the visual field, could be selectively allowed to influence firing of different sub-banks of cells, in such a way that the input could be sorted into ‘here, near distance, middle distance and far’ or maybe up to about seven separate objects. Precession of the phase relation may allow near distance to become here, middle distance to become near etc.
This mechanism would explain how feature binding can make use of an attribution to ‘role x’. A stimulus from a red sensitive cone can encode ‘there is red at this point’ through its initial connections but at some point this information needs to be usable to infer that the colour belongs to a particular object. We want to be able to combine something like ‘x is red’ and ‘x is a triangle’ while linking other data to y or z. As indicated above, synchronisation can achieve this. As discussed later, this triaging of information to roles may be of crucial importance to the subsequent emergence of language.
An upshot of all this is that higher processing can extract whatever abstract relations seem most relevant, at a potential cost of making errors. Triaging with assignation to multiple roles, even in the simple sense of figure and ground, would seem to be a crucial factor in building a higher inference repertoire. The system might infer ‘there are four tokens of the x type’. As long as the nature of the x type has also been inferred elsewhere this proposition can be forwarded for use in a variety of further inferences without losing specificity. The useful information is held in a new proposition space but is still there.
The idea that propositional spaces change with feedforward steps also overcomes a potential difficulty with capacity that arises if we see pathways as simply ‘passing on data’. If in V1 a signal means ‘position 127/238 in the visual field is red’ and meaning is just ‘passed on’ then you need to reserve a lot of degrees of freedom still at the highest level to specify all such possibilites with one signal. On the other hand it might be reasonable to allocate a single signal at high level to ‘x is the old red chair’. But, with meaning being predicative, it would still be possible to encode ‘there is red at co-ordinate 127/238’, if relevant, by signal combinations: it might need 250 signals (bits) in a code with the density of English as encoded in binary bytes. What this I think means is that although at high level propositional space may involve word-like concepts the synchrony system can allow these to capture the meanings of a wide variety of low level signals. In fact it may only be at the high level that the meaning is locally explicit. Back in V1 there is not even a need to fix meaning in a multimodal context as being about colour or position – just two variables.
In relation to this, I would question the suggestion by Prinz (2012) that our percepts are of ‘intermediate level’. Prinz seems to discount the possibility that at the top end of the inference tree whatever senses the idea ‘x is a horse’ could also get a mass of detail with it that seems like ‘raw visual data’ together with ‘intermediate data’ like viewpoint specificity. These may not be in the same signal form as lower down but just as in 1000 words one can describe seeing a horse in terms of how surprising it is or what the day is like or exactly how far away it is, so one can also have room to describe its coat in detail and still start with the words ‘The horse I see is … … >1000’. Prinz says he is following Marr but I wonder if Marr would have been happy with the interpretation. I think the problem may be the lingering assumption that percepts are something like maps or analogues of the world. If all spikes have predicative meaning I think that has to be discarded completely.
This analysis does not, however, exclude the possibility that cells in, say, V4 can pass signals directly to the top level, since lower level meanings can be accommodated by a higher level propositional space. I see no reason why laws of linguistic propositional space should not accommodate premises that have gone through fewer abstraction steps than others, although presumably there would be constraints. You can put smiley faces in text. I suspect something like that happens. (Apart from anything this would be fertile ground for metaphor and art. Art might be something that gives us an idea by an unusual route – as a picture bypasses the normal stage of binocular comparison, a sculpture suggests a higher concept but in the context of unusual ‘raw data’ and so on.)
In summary, the key points from this section are:
1. The role of synchrony is not to synchronise spikes at a particular level but to synchronise upstream cell output with downstream receptivity. This may indirectly achieve the former condition but only as a byproduct.
2. This role for synchrony makes sense if individual cell outputs have predicative meaning. If not, I am doubtful what synchrony would achieve.
3. Synchrony may allow selective role attribution for use in high-level inference.
The individual cell as grain
Before going on, I will make some further general comments about the justification for treating the individual neuron as the functional grain.
To my mind the only computational units in brains are cells. As soon as you start talking about events in groups of cells I think any attempt to link meaning to causal dynamics falls apart. Groups of cells cannot be treated as computational units in this scheme. If 100 cells are firing, the combined pattern of firing can only have a meaning (in causal computational terms) if all the axons converge on at least one site. Points of convergence in the brain are single cells. Branches from every axon are likely to converge on each of maybe 10,000 cells, so things seem ‘distributed’ but not in any sense that ‘meaning is spread out’. Meaning, whether as output or input, always relates to individual cells. To have a physical basis, meaning has to operate locally, as all physics does, even if getting to the right place involves divergence and convergence over a vast area.
Meaning may also seem to be distributed because the brain will make inferences in a pragmatic first past the post way. Science moves forward by lots of groups working on the same problem. As soon as one group has published all the others move on from there. It may seem that there is ‘teamwork’ or ‘mass action’ but in reality each individual unit is doing its best in the light of what it knows is going on. There may seem to be redundancy, but it can be useful in unexpected ways. If quasi-redundancy includes a spectrum of sensitivities and specificities for each situation (like Venn diagram circles of several sizes) and there is competition for inference saliency you get the ‘best fit’ out of a scenario with the minimum number of passes. Thus, there is no suggestion that the sequencing of a gene can be done jointly by two PCR machines 3,000 miles apart. One machine might do one end and the other the other but there is, in all the sciences based on physics, no situation in which causality is ‘shared’ by events distant from each other. You cannot draw an inference from premises in different places just as you cannot get an intersection between two circles at different places in a Venn diagram. If x=y and y=z are in one place and z=a and a=b are in another you cannot conclude that x=b, even if the sites are only fifty microns apart in the same cortical column. Map-like representations could perhaps be ‘smeared out’ but predicative ones cannot. Apart from anything there seems to be no way in which an existing premise could be ‘distributed around’ other than simply by being copied in the sense of being sent to several sites. No sub-component would be a premise. Thus my view is that any concept of ‘distributed representation’ that implies ‘distributed computation’ is simply incompatible with science.
(There is a finer grained problem, relevant to phenomenology and philosophy of mind, of how events distributed over a single dendritic tree can function as if ‘one meaningful event’ but at the operational level I am discussing this is not a problem. All PSPs for a given cell influence axon firing in a way that can be treated as a single event in computational and behavioural terms.)
The irony of von der Malsburg’s synchrony theory is that, as Shadlen and Movshon (1999) point out, he presented it as a theory of ‘distributed binding’ although the theory actually pre-assumes binding to be at the point of post-synaptic integration, locally, downstream, in a dendritic tree. It is now clear how the theory can work in a non-distributed way. Any move away from this single cell analysis to my mind serves merely as a way of avoiding falsification by making the model too complex to test. What gets hidden in such moves is that the computational rationale becomes bogus.
Another problem I see with the idea of distributed representation working by mass action with ‘smeared out meaning’ is that it would be too slow. When visual cortex extracts a face from raw data we want to subject the data to a thousand simultaneous ‘syllogisms’ each drawing on different stored data – one for each person we know: x is like this, JA is like this, ME is not, KS is not … therefore x is JA. If there is one distributed representation, in the sense of a set of signals that is ‘presented’ in one interaction the situation is hopeless. We would need a thousand sequential presentations.
The idea of distributed meaning appears to have been given credence by the existence of connectionist models that can ‘recognise’ or correctly categorise, complex inputs using networks with nodes that do not appear to signify any recognizable concept. My knowledge of these models is limited. However, I suspect that, firstly, these nodes do in fact signify some specific subset of possible input patterns. The subset may bear no relation to any subset we might consider useful to allocate a concept to, being based on a complex arbitrary collection of co-cotingencies. The firing of the node has specific meaning but not one we would attribute general usefulness to. Secondly, I am doubtful that these models actually infer dynamic patterns in the world in the way that brains do. As indicated in the next section, all our knowledge of the world is based on comparisons that allow us to infer dispositional dynamic patterns. We may call this ‘recognizing objects’ but it is a more complex business. It almost certainly involves constant cross-checking of multiple inference routes so that inferences can be judged ‘true’. To achieve this all but the lowest level spike meanings are likely to be of a form that allows systematic comparison with other meanings –couched in terms of ‘space’ and ‘time’. Opaque local subsetting routines then become useless.
In summary, I believe that meaning can only be considered in terms of individual neuronal events. This may seem to make things complicated but it looks as if we are on the threshold of having techniques that might begin to make that complexity tractable. What we need is find ways to extract general principles that can be applied to the way individual neuronal events with semantic content relate. At present, I would agree with Poeppel and Embick (2005), that there are false assumptions both about the level of grain needed to match meaning to neurology and the ontological relation between the two.
Resources for grammatical events at a neuronal level
A key question for biolinguistics is where grammar comes in to any model of brain function. What follows is a first attempt to address this. It will be oversimplified and disjointed and will need time and input to clarify. However, I would like to see what I can muster. I will suggest that grammar is the set of rules that govern the brain events entrained in response to linguistic signs and strings. To begin with I shall explore how events of this type are entrained in the generation of thoughts or ideas irrespective of stimulus, whether that be an observation of an external event or a piece of language about an event. I shall then move, step by step, into specific issues of language. An underlying theme is that much of any ‘universal’ grammar is likely to be present pre-linguistically and that the human faculty of language may not be critically related to acquisition of new grammatical machinery.
In early sensory pathways the predicative meaning of a signal may simply indicate that a sensory stimulus is at a site, or even just an intersection of two possibility sets, otherwise undefined at that point. Later on there is a need for identifying ‘objects’ and feature binding that will entail triage to roles x,y,…. At higher levels still, concepts made use of in linguistic propositions are linked to roles that include agent and patient and the dative role of John gives a book to Mary. Further subtleties of role are reflected in mass, count, and animate nouns and intransitive, transitive and goal-directed verbs etc. In order for us to have the thoughts that are engendered by linguistic propositions it seems that signals signifying that concepts have been attributed these roles need to be combined in certain neural events. This combination can occur directly as a result of observing a world event, but also as a result of hearing an assertion about it.
A key problem for a biological account of grammar is the nature of a predication. Predication may be considered unique to language but the mere physical juxtaposition of words is of no interest. What are of interest are the brain events entrained that ‘carry meaning’. Language is traditionally considered compositional in the sense that the meaning of sentences is made up of the meanings of words. However words on their own do not have the sort of informing meanings that sentences do. In my view the components of a sentence are not words but words in positions in dynamic sequence, because predication is a dynamic, not a mereological matter. Grammar is the structure of events, not of map-like ‘representations’. Each word when in position will tend to entrain one or more signals with an atomic predicative meaning in the brain, which, with luck, will be fed in together to entrain a thought, idea or scenario more or less describable by the sentence. This coming together could entrain a further atomic predicative event corresponding to the ‘proposition’ given by the sentence but, as I will discuss, things may be more complicated.
In this model a linguistic predication is a juxtaposition of words that leads to signals with atomic predicative meaning combining to form a propositional idea. I suspect that all signals in sensory pathways are destined to combine their atomic predicative meanings in a similar way so in that sense predication can be considered an instance of a more general non-linguistic phenomenon. All that may be different about language-based predication is that signals with predicative meaning relating to e.g. objects are entrained by words in positions rather than features of the object itself.
The default assumption is that a predication involves the production of signals that combine in the only way available in the brain: convergent arrival at the dendritic trees of one or more neurons. This is the same mechanism that has allowed the inference of the presence of objects and other aspects of the world in perception. I think this assumption must be right. It raises some practical problems of combination and of reference. However, by providing a specific biophysical venue for such problems I think it may point the way to identifying the brain machinery that makes language possible.
I will start by considering the concepts that correspond, more or less, to words for ‘objects’ and ‘actions’. My thought is that the existence of these classes of concept and the propositional structures they can engage in reflect two complementary aspects of brain physics. The first is the different architectures of the antecedent inferential integration steps that give rise to cell output signals that mean ‘John plays role x’ or ‘role x is loving’ or ‘Mary plays role y’. The second is the way in which these signals are integrated in receiving cells that host an ‘idea’ of ‘John loves Mary’ as a result of combining of these signals. These correspond in some way to ‘backward looking meaning’ or reference, in the sense of correlating with a referent aspect of the world and ‘forward-looking meaning’ or ‘meaning to’, in the sense of how the idea seems to its recipient. Many people have difficulty with the idea that meaning-to should be to cells, rather than a ‘person’. However, the arguments given above about having to analyse the brain’s inferential mechanisms in terms of single cells should apply equally in this case. It may be possible to bypass this question and simply consider the analysis of grammar in terms of ‘backward-looking meaning’ but my own position is that accepting that cells are the recipients of meaning sheds light on a much wider range of questions, and without raising any major difficulties other than unfamiliarity.
Emphasizing the predicative and inferential nature of sensory signals and their processing, and the complexity of the pathways involved, overcomes the problem of misrepresentation said to be raised by more simplistic ‘causal’ accounts of reference (Dretske, 1986). If every step is an inference that allocates as best it can within an internally defined ‘propositional space’ and inferences are made as fast as possible, on minimum evidence, the risk of misidentification of referent seems implicit. The fact that the brain can make a range of inferences from a sensory input, all equally valid, but based on different modes of sub-setting possibilities means that Frege’s ‘sense’ is also entailed by the process. I do not think that claims for existence of a non-causal relation of ‘reference’ about which there can be a fact of the matter, independent of causal connections, has any value in a naturalistic account and is likely to give rise to all sorts of unnecessary circular arguments. Equally, I see teleosemantic theories (Millikan 1984) as misconstruing the causal principles of Darwinian evolution.
The integration pathways that give rise to signals dealing with concepts like John or loves will be complex but it may be possible to outline the territory. My assumption is that meaning relates to what we can infer about the possible ways the world might be constituted by dynamic patterns – subsets of all possible ways. The excitation of a retinal cone can only signify that the subset of possible ways the world may be is that which includes the fact that ‘this retinal cone has been excited’. However, the fact that this cone always connects through certain pathways to a bank of cells that receives signals from other cones through other pathways means that the brain can infer from changing patterns of inputs to these cells that it can allocate the label ‘red’ to this signal and also a particular site in the visual field. With time the brain can learn, at first implicitly and then explicitly, that this signal, in the absence of similar excitation of green and blue cones at the same point, indicates that subset of ways the world might be that includes the arrival of some photons of wavelength about 700nm and that this entails, fairly reliably, that some pattern of dynamic (causal relational) dispositions that include preferentially reflecting, transmitting, diffracting or emitting this wavelength is operating at particular vertical and horizontal angles to the central line of vision. With more information the brain can infer with a high level of confidence that the pattern of dynamic dispositions is of the sort colloquially known as a tomato on the table.
I am talking in terms of dynamic dispositions because I think it is important for the whole approach to think in terms of dynamic relations based on patterns of causal disposition, along the lines of Shoemaker (1980) or Ladyman (2007), rather than talk of ‘seeing objects’, because I think the structure of inferential pathways in the brain is all about constancies and inconstancies in inferred dispositional patterns (Barlow 2009), even if these get labeled as ‘objects’. An object taking a count noun or name is a stable type of dispositional pattern with short-term constancy, or predictability, in spatial relation to the subject but not necessarily long term. It is a pattern of disposition that is conceived as operating as many times as encountered (and in between). It always has one location. A mass noun, in contrast, refers to a dispositional pattern that operates as often as encountered but has no unique spatial location. These are crude definitions but they indicate that we can expect the inferential processes going on in sensory pathways, akin to Hubel and Wiesel cell function, to be stacked up in such a way to make different sorts of comparisons, or differentiations, in space and in different aspects of time. Early stages will build up a discrimination between dispositional patterns in various parts of sensory fields and later pathways will draw on memory connections established by learning that allot the current dispositions either to objects (of unique place) or materials (no unique place).
An important aspect of a dynamic approach is that time has two quite distinct aspects that need to be considered separately. Time is a metric, like space, and differentiation in metric time can allow inference of speed of motion, vibration or flashing etc. – the temporal ‘shape’ of things. Our concept of time also includes a concept of directional sequence, which is what makes time asymmetrical. Differentiation on the basis of sequence is what allows us to infer ‘cause’, which presumably underpins roles like agent and patient, even if their attribution in natural language may sometimes seem questionable, as in ‘I see the chair’. Since inference itself is dynamic, space, metric time and sequence will all be involved in more or less all inferences but in different ways that can allow conclusions predominantly about one or another aspect.
I can give no detailed account of the nature of the pathway patterns needed but my suspicion is that they can be predicted from simply considering the mathematics of the sorts of spatio-temporal comparisons that need to be made. We have well understood examples in retinal ganglion cells and useful circumstantial evidence from line-detecting cells in occipital cortex and entorhinal grid cells and studies of animal navigation. The simplest model is of sequential banks of cells that integrate on a linear summative basis (‘integrate and fire’) that by making use of inhibitory signals can identify patterns of change in space and time. More sophisticated non-linear integrating mechanisms that allow dendrites to make direct use of temporal relations have been suggested for some time and the recent paper from Häusser’s group (Smith et al., 2013) looks to have identified an instance. It is now understood that an important aspect of the way we infer objects from visual input is by comparison of information from the retina with information from eye movement. Right from the start the visual field is re-classified in terms of constancies of differential in space and time.
Concepts covered by verbs might seem very different but if thought of in terms of patterns of dynamic disposition they may not be. It is just that the dynamic dispositions are being considered individually on their own merits rather than as parts of the clusters of dispositions that nouns tend to cover for objects and materials. Tensed verbs also cover particular instances of operation of dispositions rather than dispositions that might be manifest many times. Untensed verbs (gerunds) behave more like mass nouns. An object whose identity relates to a past instance of operation of a dynamic disposition may acquire a verb-like count noun, as in a painting. Inferring a conclusion that could be expressed by using a verb would be expected to involve integrations of early level proposition rather differently across space and time, implying different connection patterns, but making common use of basic inferential routines in different combinations.
Although we have quite good evidence for cells whose outputs specify a noun-type concept we do not have such clear evidence for verb-type concepts. This may or may not be significant. It is also unclear how these would handle tense. Nevertheless, it seems plausible that such cells exist.
What may be important to stress is the difference between the structure of cellular connectivity that establishes the disposition for a cell to fire when a particular verbal concept comes to mind and the act of making use of that disposition – in effect the difference between having grasped a verb concept and an act of using it in a predication. The referent for a verb may be a single instance of operation of a disposition (John killed Bill.) but a predication is itself an instance of operation of a ‘verbal disposition’. It is an event. The dynamic nature of verb-type concepts should not be confused with the dynamic use that a verb is put to in a predication.
Another background issue is that the brain’s mechanisms for inferring dispositional patterns by differentiating in space and time must draw on two different options. One depends on mathematical relations, presumably handled by some self-calibrating comparison system using statistical weightings of associations of signals to extract precise quantitative spatial and temporal inferences that can be fed into motor acts like placing a thumb across a cello string to less than millimeter accuracy from mid air.
The other mechanism depends on associations based on qualitative and often arbitrary associations, as in naming, or extrapolation from conventions, as when expecting a shop to be open by mid-morning. These ‘spatial’ and ‘verbal’ inferences may be thought of as right and left hemisphere-related but the comparable architecture suggests that this is simplistic. One might expect spatial and verbal inferences to employ very different architectures, levels of redundancy, selectivity etc. Verbs may relate more often to quantitative processing, but not always.
The relationships between these inference systems may be important. There is some evidence for them functioning independently, as ‘where’ and ‘what’ systems mediated by dorsal and ventral streams. However, in most contexts they probably function in tandem. When they do, qualitative inferences would appear to predominate at higher linguistic levels, with quantitative inferences contributing at lower levels. However, if there is uncertainty about the reliability of a qualitative inference, evaluation of its truth may most often involve reference back to congruence with lower level quantitative inferences. It seems likely that this sort of truth evaluation will occur across all levels of inference to some extent, rather than being a specifically linguistic function.
To summarise, the referent meaning of all our concepts can be traced back to differentiations in time and space. In simple terms everything is a matter of differentiating the dispositional patterns we call electrons, manifest either via photons (light) or phonons (sound and coherent objects). All sensory modalities just detect these basic patterns in one way or another. Early inferences tend to be presented in terms of ‘modalities’ such as colour or smell, but higher-level inferences transcend these. Although I have suggested that cellular output signals have predicative meaning from the start in perception, questions about language mechanisms take on a more specific relevance once these meanings start to involve combination of the sorts of concepts that equate to commonly used nouns and verbs. To maintain an approach based on inferences it looks as if we need to postulate the existence of concept or ‘gnostic’ cells whose job it is to signify, more or less, a familiar concept or to proffer that concept as a relatum in a proposition.
We have evidence for individual cells that indicate recognition of an instance of such a concept by their output, the most familiar being the ‘Jennifer Aniston cell’, whose output is triggered either by her image or her name. There are important caveats here, in particular in terms of redundancy, but I do not think they detract from the main thrust of the argument. These outputs may trigger ‘reflex’ behaviors directly, but if they are to take part in a compositional and systematic way in the production of new considered thoughts or statements about those thoughts, then we have to consider what sort of inferences are instantiated by the outputs of these cells meeting up with outputs from other cells and generating further spikes in downstream cells. The default would seem to be that the next lot of cells will be sentence cells. However, at least at first this looks unpromising. There might be enough neurons in a brain to allocate one to every considered inference we ever make, or sentence we might utter, if we make one at a time. However, apart from the fact that this would leave few cells to do much else, it seems highly implausible that it would be advantageous or even how it could be made use of.
To unpick this problem it is useful to review the requirements for concept cells in a bit more detail. I have suggested that all cells have outputs that signify propositions about dispositional patterns, so we are not expecting gnostic cell outputs to correspond to single words. But we may consider a cell as corresponding to a concept that we give a single word name, with some basic role predicated on it. So an output from a Theaetetus cell may mean ‘Theaetetus is in role x.’ An output from another cell might be ‘Role x is to sit.’ Role x may seem invisible or implicit and perhaps redundant but I suspect it cannot be if we are to have combinatoriality and systematicity and the sort of complex multiple role-invoking predication we can achieve. How many roles we need is moot but I suspect it is more than 2 but not very large: maybe 5 to 8. (Seven is the magic number for short-term memory capacity.) When it comes to producing and listening to language itself, because words can be given roles by position, the role label can seem to be omitted. But in languages like Japanese, Latin and Bantu, and less so in English, the role may be retained as a suffix or case form.
One reason why roles may seem invisible is that they may effectively disappear once a signal has arrived at its destination. For a signal to indicate something is occupying role x it must arrive at a site that is only used for role x occupiers, to couple only to signals that are role x specifiers. The role-assigning predication is simply a reflection of a pattern of signal co-arrival. Without co-arrival the signals would not relate. Thus, as for variables in logic, it does not matter whether role x is designated x or y or z, since this designation disappears once it has served its purpose of bringing signals (or logical propositions) into relation. There are more complex questions to address about exactly how this might work but I do not think they affect the central argument.
So in one sense the building blocks that can provide a resource for generating ‘grammatical events’ are the dispositional patterns provided by the way that gnostic, or concept, cells come with antecedent connection patterns that have allowed inference of particular types of differentials in space and time. Put another way, grammatical events are made possible by the architecture of ramification of axons that determines the way signals are fed into the next bank of cells at each step during inferential procedures. If rules of signal integration in dendrites are complex, as the recent study from Hausser’s group (Smith et al., 2013 doi:10.1038/nature12600) suggests, then this will be important too.
An important aspect of gnostic cell connections relevant at this point is the mechanism that determines that a particular cell (or cells) is allocated to a specific learned concept such as John, or a place or even an episode. The allocation appears to occur more or less instantaneously in many cases when we ‘grasp a concept’. The most plausible explanation seems to be that we have large numbers of ‘unallocated’ gnostic cells (perhaps ten million) and that when a new concept is learned one or more cells rapidly acquires reinforced feedback connections to the cells that fed into it at that point in time. The cells ‘selected’ may be those whose response patterns most closely accord with sensory signals coming in at the time out of a population that is programmed to be heterogeneous in responsiveness, as immune cells are. I use the term ‘mordant loop’ for this immediate reinforcement, by analogy with a chemical mordant that fixes a dye immediately into a cloth. Although it is not known how this would be achieved it is not difficult to envisage a suitable connection arrangement, which probably involves at least one interneuron step (Trehub (1991) has explored models of this sort). How reinforcement can be achieved more or less immediately is also not understood, but our capacity to retain very brief memorable appearances seems to require this, whatever model is constructed. So when introduced to John and receiving sensory signals about various features one or more cells reinforces its input connections from cells encoding those features such that it will reliably be activated by that group of features. A degree of plasticity will allow features to be added or deleted. A more complex loop system can also allow the name John to be linked to the gnostic cell by reinforced connections (Trehub (1991) also addresses this).
An important implication of this mechanism is that the same feedback pathways that were used to ‘fix’ the strength of input connections can later be used to reactivate these feature cells in response to any signal that activates the gnostic cell. So hearing John’s name will lead to activation of cells encoding his facial or bodily features. The upshot is that when lower level information is used to infer higher level information the lower level information need not be ‘lost’. The trail of reference does not go cold. This may be crucial to our ability to sense concurrently both the identity of an object and its constituent features.
Although I think it remains an open question, I suspect that this mechanism of retention of information via a mordant loop has an important implication for the way roles are assigned. If roles are assigned purely temporarily by time-dependent gating during synchronized firing, as described on page 5, then it is difficult to see how this role allocation could be retrieved by activating a mordant loop, at a later date. This is perhaps the most significant reason for suggesting that gnostic cells may have fixed role allocations at least in terms of e.g. agent or patient. (A way of avoiding reduplicating gnostic cells might seem to be for each of these to be associated with a small number of ‘private’ interneurons, inserted into connection paths such that communication with the gnostic cell was always through a ‘role-specific path’. Arguably, however, these interneurons are now the role-specific gnostic cells they were invoked to obviate.) Fodor and Pylyshyn (1988) argue that there are too many roles to handle propositions like John loves Mary and Jill loves Bill. However, I think that parsing a linguistic structure may involve ‘online triage’ that allows role reduplication in the way that visual perception can handle several objects.
The suggestion that the brain stores not just one instance of a concept but enough instances to deal with a small pragmatically defined range of roles may seem ‘inelegant’. However, the genetic code is well suited to this sort of reduplication and the immune system provides a number of close analogies. If the equivalent of a gnostic cell is a B lymphocyte clone committed to producing a specific antibody (the immune system ‘conceives’ a microbial protein by making a specific antibody to it) we find that the immune system can produce eight subclones, each with a different functional role determined by the class-specific tail end of the antibody molecules produced: IgA, IgM, IgG1, IgG2 etc..
Difficulties with understanding how the brain handles complex meanings often hinge around these sorts of reduplication issues. A related problem is that of ‘twoness’: how a concept can appear twice in a ‘mental representation’ if the concept is encoded at one locus. I think the solution to many such problems is partly to be pragmatic, as indicated above, and partly to emphasise the dynamic, event-based nature of the way the brain handles meanings. If a ‘mental representation’ is an event and that event involves a sequence of signals of the sort known as ‘chaining’, commonly found in motor routines, then there is no particular problem with a concept locus being visited twice if more than one cell in the chain is linked to it (as a branch off the chaining path). A stereotyped motor routine like cutting bread often involves multiple visits to subroutines. As I will indicate later, I suspect that chaining of the sort essential to motor routines may be of special relevance to human thought and language.
So how do outputs from gnostic cells give us the sorts of thoughts that can be entrained by sentences, whether or not these are entrained by observing the world or from hearing about it? On a set theoretic basis one would expect the conjunction of two concepts signifying patterns of dynamic disposition or instances of operation in time of those patterns to give rise to an intersection – a subset of possible ways the world can be that is defined by this intersection. Since nouns and verbs tend to ‘cut the dynamic possibilities at different joints’ the implication derived from combining the two as ‘premises’ can be straightforward. On this basis ‘Theaetetus sits.’ tells us that the way the world is includes a dynamic dispositional pattern now being instantiated that involves a certain configuration of the human body and that the body concerned has certain long-term dispositions we recognize as unique to Theaetetus.
If this is what is entrained in linguistic ‘Merge’ there may be a sense in which the components are associated as members of a set of dynamic relational elements but what seems more interesting is the way they relate, which would seem to be some form of intersection of dynamic possibilities. However, being derived from different differentials in space and time we would not be expecting the laws of commutation and association to be Boolean, since the sequential component of time is directional. Everything being based on dynamics we should probably expect the non-commutative relations of matrix or operator algebra as seen in physics. In other words, if encoded in strings, order will matter. Word order convention for any one language can be arbitrary, but there must be a convention that relates it to appropriate dynamic relations. But unless non-communtative roles are given by suffixes an order rule will be needed. On the other hand, within the brain, where neural signaling events have a fixed spatial direction but are not strings we would expect non-commutativity to be reflected in the spatial arrangement of connections.
I have suggested that in the brain there are no strings. However, the proposed triaging of sensory data by synchronization raises a further issue that may be relevant to language parsing and production. It is likely that triaging in visual pathways is salience-lead in the sense that a large, contrasting feature may get allocated to ‘object 1’ with decreasingly salient features being allocated to ‘objects 2,3,4…’. However, when signals are being sorted in a higher propositional space it becomes appropriate to triage on more complex characteristics that imply roles such as agent and patient. The mosquito takes precedence over the arm. In generating the sort of scenario that might be covered by a sentence some sort of ‘label-reallocation’ process is likely to go on. This may be of direct relevance to the phenomenon of movement in linguistic structures, that allows ordering by role to be converted to ordering by salience as in ‘It’s Mary that John loves.’ It seems likely that for an idea of the sort that at least any mammal might instantiate quite sophisticated rule-based rearranging machinery will be in operation, whether or not language is involved.
But what would this conjunction of two or more cell outputs be? As indicated before, this is where I think the idea that the brain has ‘distributed representations’ maybe a bit like maps misses the point. The simultaneous firing of two cells is in itself of no interest. We are looking not for simultaneity but dynamic conjunction – co-determination of some outcome. We are not so much looking for a map as a reading of a map. A map in a rucksack is no use on Ben Nevis. The only thing a reading can be is integration of these two signals in the dendritic trees of each of a number of downstream neurons. The meanings of the signals then become premises for a further inferential process.
Yet something is odd here because we may not expect these next-along cells to have any commitment to this thought in particular. The conjunction of concepts involved does not seem like the sort of thing for which a mordant loop will be set up so that we ‘grasp’ it for future use. How does the output carry any useful information? We seem to want to ‘cash out’ Theaetetus in role x and sitting being role x as the way the world is, Theaetetus sits, but it does not seem as if we are going to be able to make use of this thought, encoded in a signal with no fixed assignation. So far we have relied on premises for further inferences being fixed concepts paired with roles – rather like assignation of arguments to functions. Now we may need something different.
We also seem to have a further problem if we consider anything more than this simplest bipartite statement ‘Theaetetus sits.’ What about John loves Mary? It seems likely that the inferential pathways that determine reference to the state of the world almost always need binary junctions in the sense that a cell cannot usefully derive an inference directly from three classes of input each with a different dynamic role. I am not sure why but I suspect it may be difficult to avoid ambiguity without unrealistic resources. It may in fact be computationally impossible, a bit like solving for n variables with n-1 simultaneous equations. In this sense I suspect that the binary structure of Chomskyan syntactic trees reflects, indirectly, a binary structure to the inference pathways that define reference. If we are thinking that John loves Mary that thought will be contributed to by one or more cells whose outputs indicate John is in role x (agent) and these cells will have antecedent connections of the sort that underlie inference of a dispositional pattern that is a discrete object, animate and, being human, with a permanent name. There will also be a contribution from cells signifying that loving is in the action role and these cells will have appropriate upstream connections for inference of a different sort of dispositional pattern. And so on for Mary for patient role y. We have three roles entailed in the input.
So there seems to be a problem here we did not have for Theaetetus sits. We want all the components of John loves Mary to be usable en bloc as an idea but whereas it seems plausible that cell outputs in a brain are allocated to ‘John is in role x’ or ‘loving is role x’ it seems much less likely that a cell should be allocated to ‘loving Mary is role x’. We seem to want all elements to ‘merge’ in Chomsky’s terms, yet we want the logical work to be piecemeal and bipartite.
This problem might be illusory. We do not know much about the range of concepts individual cells are ascribed to in learning, but we have some clues. There is quite good evidence for ‘person cells’. The situation for ‘place cells’ is perhaps more interesting. Cells in the hippocampus known as place cells do not signify a specific place. Rather their firing indicates that ‘given that I am in environment E, I am at the place that plays role x in E’. Perhaps x is the first place I come to in E or the place where I usually perform some activity in E. In addition there are, in another brain area ‘grid cells’ whose firing seems to indicate ‘ I am at one of a set of sites x within the boundaries of this environment that are separated from each other in a hexagonal grid by n metres.’ What the existence of these cells seems to suggest is that cells can be allocated temporarily to a combination of a grasped concept and a defined role.
Nevertheless, although this scenario may give clues to mechanism, but it does not deal with the desire for some sort of ‘one stop merge’ venue.
The more general problem may be illusory for another reason. It may be that at the highest level of thoughts that seem to have linguistic structure the inferences being made change their nature radically, with the binary syllogistic inferences considered above being worked out elsewhere. An interesting question is waiting in the wings here: where does a chain of inferences go at the ‘highest level’. The answer would seem to be that it must ‘tunnel back’ into lower levels of inference through feedback loops reminiscent of the mordant loops that probably form the basis of individual concepts.
Although it seems unlikely that we would have individual cells committed long term to an output meaning Theaetetus sits, let alone John loves Mary, to instantiate that meaning at all it seems that some cells will need to be allocated to having that as ‘the state of the world presented in their input’. Otherwise it would seem that no downstream behaviour could include the report of ‘having this thought’ since nowhere was the total proposition John loves Mary instantiated. Signs for John loving would not join up with signs for Mary. We seem to need this coming together to get ‘meaning to’, not just in terms of ‘sensing the meaning, but to be able to generate a report.
Where I have so far assumed that the output of a given cell would have a determinate meaning traceable back through inferences to experience of the world in this new case it seems that we want a cell that can transiently indicate a meaning, of which there will later be no trace. Moreover, to cope with, if not all possible ways the world might be, at least all the thoughts ‘dreamt of in our philosophy’ with even a million cells of a functional type it looks very much as if we want some cells that can host a whole range of different meanings as inputs. In some sense the buck seems to have stopped and we are wanting different sorts of meaning as output from such cells. This might be not so surprising since once we are into motor function it is unclear that meaning, at least assertive rather than imperative, exists. If there is meaning in motor signals it would appear to refer to dispositions within the brain itself, rather than in the outside world. As I shall come to, I think this may be very relevant but the shift in meaning at the sensory-motor interface may be quite complex.
Scenario-handling and minute-keeping cells
I can think of two answers to the dilemma posed above. Both suggest that it was an error to think that the default inference from Theaetetus is in role x and role x is to sit would be that Theaetetus sits. At previous levels of inference the assumption has been that at each level the propositional space, and the terms in which it might be encoded, will change. X is round and x is red may lead to x is a tomato. Nor are we expecting to draw a single inference at any level, but many, which can then compete for salience. So our combined input may lead to an inference that the person in role x is to partake in the meal, or that Theaetetus is in the room, or that it is now polite to sit oneself, or a hundred and one other things. A wide range of cells with different response characteristics can have in their input the scenario that Theaetetus sits but respond with completely different inferences that may variably depend on some or all components of the scenario. A response to an input scenario of John loves Mary might be that Mary will (hopefully) be happy, on the basis of being loved, by whoever.
The simplest sort of response to a scenario is perhaps a motor response, where the inference perhaps no longer refers to dispositional patterns in the world but to ones own internal dispositions. Other responses, such as that Mary will be happy, occupy a more complex situation between perception and action that will often involve hypothetical scenarios – we think of Mary smiling.
This first solution to the problem suggests that the process of inference goes on much the same, just shifting into ‘motor mode’ either immediately or more gradually. This gives some justification to the idea that there is no watershed where scenarios or ideas ‘arrive’, that the same sort of processing continues right through. However, I think it is a mistake to interpret this as meaning that there is no locus where the scenario is the input. The suggestion here is that there are a large number of cells whose inputs constitute the scenario.
There is a caveat here, nevertheless. If these cells are responding on a more or less integrate and fire basis then the question arises as to whether those cells whose responses only depend on certain aspects of the scenario (Mary being loved) will actually retain any input synapses for the remaining content (John is in role x). It seems a little precarious to attribute the experience of a scenario to a collection of cells, many of which may be able to do without much of the detail. Indeed if inferences can only really cope with two role attributions at a time it all looks rather fragile. The situation might do for a frog snapping at moving black dots in its visual field, to whom we might not feel the need to attribute an experience of a scenario of ‘there goes a nice fly’. It seems less satisfactory as an account of a cat inspecting a new visitor to the house from a cushion on the sofa. This provides a reason for considering a second solution to the problem.
For standard inferential steps we can think of an output as being the conclusion of the premises of a syllogism. However, there are reasons to propose a cell type with a very different function: simply to determine whether to induce the cells currently feeding it with signals to fire again. (The mechanism may include a temporary version of that invoked for the formation of mordant loops.) Such a feedback ‘repeat’-command cell type seems likely to occur at sites where salient material is being held in frame (and also fed into long term memory in some cases). It might be argued that the gnostic cells indicating salient role-concept pairings could just have a default repeat mode that could be overridden by competition from other bottom-up signals. However, at least in mammals, it seems likely that whether or not salient material is retained as the focus of attention is dependent on inputs from the other sort of scenario-handling cells indicated above and that some sort of co-ordinated ongoing regulation of what material is to be allowed to come up from lower levels is going on. A cat will keep the idea of a mouse in frame, even if it has hidden from sight, if it has inferred that it is worth chasing. As I will come to later it may be that for non-human animals this reflects the maintenance of a frame tied to a spatial scenario, rather than an ‘episode’ scenario.
An interesting aspect of this ‘minute-keeping’ type of cell is that it can have, indirectly, as combinatorially large a number of output meanings as it has inputs. It can mean ‘keep A,B,C… in frame’ or keep P,Q,R… in frame’. Such cells would allow mammals to entertain and, for us, report on, a vast combinatorial range of scenarios with a limited (if large) repertoire of fixed gnostic cells. The cell draws no specific inference, except perhaps that, if it has an input component that simply switches it on or off, its inference, when firing, may be that the previous data are still relevant and when not firing, that they are not.
So it may be that when asked ‘what are you thinking about’ it is copies of the signals that form the inputs to these ‘minute-keeping’ cells, sent via axonal branches from gnostic feeder cells, to a speech production module that determine the response. Note that this model suggests that a report of a thought is in a strange sense not a direct report but a report based on the feedback message from minute-keeping cells to the gnostic layer ‘tell the speech centre to describe what you just told me’.
In this respect I think it may be useful to consider a further complexity to the simple concept of feedforward or feedback pathways. My suspicion is that there is a genuine hierarchy of cell banks that deal with increasingly abstracted and language-like concepts – with maybe about six levels. At the ‘top’ of this there are minute-keeping cells that always pass material back for repeated processing at lower levels. At the penultimate level, and perhaps to a lesser extent at lower levels I also envisage connections to banks of ‘scenario-handling’ cells that do most of the ‘thought-crunching’ at that level and feedback onto that level in the way that minute-keeping cells feedback to the top level.
It may be of note that both gnostic cells and minute-keeping cells have been described as associated with specific feedback loop mechanisms. The gnostic cells have long-lasting mordant loops that form the very basis of their ‘concept-tracking’ function. Minute-keeping cells have feedback loops to whatever happens to be feeding them signals at the time that either do or do not encourage those cells to fire again and keep these concepts in mind. In comparison, we might consider these ‘whiteboard loops’, as easily erased as they are made up. But there may be further complexity to this, in that what is in current thought, what is immediately retrievable as being a thought just passed and what is reasonably easily retrievable as a thought earlier in the day are all categories we recognize. Moreover, our long-term memories seem to ‘bud off’ from these short-term storage systems. Perhaps mordant and whiteboard loops are two ends of a spectrum of one mechanism. Perhaps minute-keeping cells are just unallocated gnostic cells that never quite get committed on that occasion, I shall explore these issues further in the context of the evolution of language.
A final thought about the way minute-keeping cells might reflect the content of current thought is that here, and perhaps throughout the inferential process, there may be an important asymmetry in the way we look at inputs and outputs. I think it may be true to say that although the individual cell output signals that are due to contribute to a cell input will be in a predicative form, the fact of their co-arrival as input may generate a pattern that is not so simply described as predicative in the sense that relates to being truth evaluable. I suspect that the input to the cell is not a combination of propositions so much as a scenario or idea. A set of premises is not itself a single conclusion. Many conclusions can be inferred from a set of premises. So the combined arrival of a set of signals each with predicative meaning is not the same as any inferred proposition that might be indicated by an output. Thus it is not itself propositional – which may help explain why perception does not immediately seem to be propositional. Strangely, it seems that in the brain predicative or propositional meanings may be atomic and their molecular combinations into ideas non-propositional, rather the reverse of how things appear in language.
In this context there is a technical issue about the term ‘propositional attitude’. As I understand it a proposition is a means of conveying meaning that has an internal dynamic relation that may be deemed true or false. So propositions are truth bearers, as conventionally conceived. However, I am skeptical about the idea that our beliefs and desires are attitudes to these means of conveying meaning. I am inclined to think that our beliefs are directed at the ideas that propositions engender so I think the term ‘propositional attitude’ is misleading.
This raises the question of how something can be evaluated for truth. I think the answer is that this is a different task, handled in a special way, and I shall deal with this in the next section.
In summary, I think there may be reason to think that the cells whose input signifies the ‘content of current thought’ may be in some sense at the top of an inferential tree and not directly involved in further inference of the sort found in sensory pathways. Moreover, I think there are reasons to think that inasmuch as our train of thought follows a string of inferences these are likely to be based on single pass or multiple pass integrations of signals (encoding components of the same content sent to minute-keeping cells) formed by branches of axons from lower level gnostic cells that feed into other cell types that perform functions that reflect the binary nature of Chomskyan tree branches. When these throw up inferences that compete successfully for salience with the current thought their content is then passed to the minute-keeping cells. It is likely that there is a lot of work to do here and that a large number of cells are allotted to this ‘grape-treading’ role. Maybe they involve heavy-duty statistically weighted quantitative computations in some cases, needing banks of cells and large numbers of inputs. Maybe they often, or always, make use of allocation of objects to argument roles in functions. Maybe they make use of simulated motor routines for calibration. There is a lot to consider here but much of it is, I think, tangential to the issue of grammar. The one process that I do think needs to be added in here, and also probably at other stages, is truth evaluation.
In order to make sense of thought as well as language I think we need to consider the existence of another very different cell type – a truth-evaluating cell. The suggestion I will make is that these are cells with a third, and perhaps even more different, mode of signal integration. Cells drawing inferences can theoretically use something close to a linear summative integration method. ‘Minute-keeping’ cells as defined above may fire in response to conditions that are not directly determined by the signals carrying the ‘content’ that is to be re-encoded on the basis of a feedback signal. Nevertheless, they are still likely to fire in response to the strength of a certain type of input. A truth-evaluating cell, on the other hand, might be expected to compare two input segments, with firing (or maybe not firing) reflecting ‘matching’ of the content in the two segments. In a sense this is a subtraction rather than a summation cell. Truth is the zero obtained by subtracting two equivalent elements, rather paradoxically signified in computers not by zero but by unity.
I see truth evaluation in the Leibnizian mode where a predicate is implicit in the subject, making truth an internal congruence within a proposition (between the predicate and the subject that already entails that predication). A predication is essentially another way of entraining an idea already arrivable at through activation of the cellular machinery associated with a the grasped concept that is the subject. In this sense truth might be considered a comparison between two ideas – that entailed by the predication in hand and that entailed by the existing concept of the subject. However, this reduces to the first analysis since the inference entailed by the new predication has already been subsumed into the grasped concept of the subject and that is why the ideas match.
The idea of predicate being implicit in subject would seem to fit easily enough into the previous discussion of gnostic cell connections for ‘JA has blonde hair’ - in the feedback connections that gnostic cells must have to cells representing features that constitute the main concept. Thus a JA cell will have a feedback loop to a cell whose concept is a certain sort of nose – whose firing together with a cell for a certain sort of hair etc. will lead to the conclusion that the dispositional pattern in hand is JA and activate the JA cell. For John loves Mary things may be more complex as indicated above but if we know John loves Mary the connections must be there in some form. All that is needed is that these can be co-retrieved via mordant loops in some suitable format.
Something that interests me here is that if the uttered sentence ‘John loves Mary’ is new information then the predicate is not implicit in the concept for the hearer, and accordingly the statement is not evaluable as true for the hearer at least until she has updated her concept of John by forming a new feedback association. Predications are only truth evaluable if not novel. In real language use we rather expect useful predications to be the novel ones, even if in idle chatter they are often not, as in ‘It’s a fine day today.’ I wonder if the truth evaluability of predications is overrated in philosophy where emphasis is often placed on statements that have been repeated rather more times than they deserve! Relevant to this is the suggestion that propositions are not really about how the world is so much as ways of communicating how the world is. They are increments of information. This might also help explain why most of the time what we feel we are aware of does not seem propositional. There is no sense of anything being ‘upgraded’, just the way things seem to be now.
On the other hand when we start thinking about what might happen in the future or draw on events in the past we begin to be aware of a sense of upgrading and flow of information. The interesting parts of human thought are to do with ‘what if’ scenarios dissociated from here and now. At this point our thoughts may begin to seem much more propositional and explicit propositional (or ideational) attitudes come to the fore. I will return to these issues when considering the special features of linguistic thought.
As for minute-keeping cells, there need not be a problem getting the whole story of John loves Mary into the input of a truth-evaluating cell. Cell inputs can cope with many degrees of freedom (many bits). However, whereas for minute-keeping cells the suggestion is that it is functionally essential for the entire content of a scenario or idea to occur in the input, for truth-evaluation this need not be so. It is unclear whether we should expect truth-evaluation to be a one step process or whether it may involve a series of steps passing through several cells in a chain. At any one stage only one aspect may be compared making the mathematics reasonably simple. On the other hand, I suspect that the real power of truth evaluation comes in the comparison of two asymmetrical inputs that have the same significance in the sense that 7=3+4 is asymmetrical and 7=7 is not. This would require the mathematics of integration in the dendrites of a truth-evaluating cell to be more complex and to reflect at least some detail of the content being assessed. Perhaps every detail of an idea arising from a linguistic predication is presented in the input. However, it may be more difficult to equate the input to these cells to the content of an idea than it is for minute-keeping cells because the input needs to be a dual one – to allow comparison.
Another issue to be factored in may be that we need cells that assess emotional, ethical and aesthetic value as well as truth. Retention in memory is closely related to emotional evaluation. Truth may be denied on emotional grounds as often as on strictly inferential grounds. Maybe we like to think of truth as when a quantitative analysis in space and time ‘from first principles’ is used to vet an inference that draws on qualitative resources like names and conventions. But there are many areas where a quantitative analysis may not be applicable. Moreover, the traditional concept of truth is a qualitative one. There are clearly further complexities here, but they are beyond the present scope.
Although truth-evaluating cells and minute-keeping cells are different postulates, they share a common feature – that the referent of their predicative output is not a précis of the content of input but something general. Perhaps the referent for the output of a truth-evaluating cell is what Frege proposed as the referent of a sentence – its truth value.
Truth evaluation is traditionally considered in terms of higher linguistic thought but in terms of comparison of content at different levels of inferential structure it ought to be relevant all the way up the inferential tree. Identification of mismatch at early levels of inference in perception may be crucial – maybe the basis for calibrating the inference machine in the first place in infancy. This also has relevance to multimodal sensory input and the relation between the senses. The ‘meaning’ of a visual percept might be considered the way it would be presented tactually, and vice versa. Meaning then becomes defined as what a matching input would be in a different language/modality of input (maybe reflecting Tarski?). Truth is then the matching. The issue of truth tends to be more transparent in a ‘what if’ scenario in which inference tends to be explicit, in contrast to the lower level inferences of perception, which tend to be unapparent. This may justify a concept of purely linguistic truth, although I think it is more useful to consider truth a more general relation.
Although truth evaluation may be integral to every step in the inferential process that seems to map across the cortex from occipital pole to frontal regions it seems to be so different from the process of inferring that which is to be evaluated that I do wonder whether it may be carried out somewhere else, such as in the thalamus. At least one might expect the cells to be a special shape.
Meaning-to as input
A cell-based analysis of meaning entails the unfamiliar idea that meaning-to will be meaning to one cell at a time. It is often assumed that meaning is to a person, but if we are trying to naturalise language at a neurobiological level and trace meaning in biophysical terms we are already at a scale below that of person, and also below that of brain.
Alternatives to the idea of meaning to a person in the literature tend to fit one of two positions. In one, meaning is to some unspecified functional unit that ‘makes use of sensory input’, assumed to be some sort of ‘network’. For reasons given above, I think this is the wrong level of grain. There will be masses of cells on the receiving side of meaningful inputs but each has to be considered on its merits, since there is no possible physical interpretation of meaning that relates to distributed events, other than in the sense of being duplicated.
In the other position, it is denied that meaning is to anything, on the grounds that this would be ‘Cartesian’ and entail a homuncular paradox, (which I think is a conflation with an unrelated issue). This fails to explain what the point of deriving inferences about the dynamic patterns of the world is, if these are not ‘consumed’, in Millikan’s (1985) terminology. The objection to the idea of meaning being to anything is often associated with a concern about phenomenology. However, the arguments I want to put forward here are, I think, just the same for a philosophical zombie. In simple terms we want reports of ‘what these words, or appearances, mean to me’ to refer, in some sense, to internal neurological events. There are further layers of analysis, beyond the present scope, but they do not, I think, obviate the need for there to be a sense in which inputs to cells mean something to those cells.
If the content of a thought entrained by a complex grammatical structure like ‘John gave the red book to Mary on Thursday.’ is to mean something to each of a number of cells receiving corresponding synaptic inputs then intuition suggests that the content would need to be reflected in considerable complexity in the architecture of dendrites and also in the internal dynamics of post-synaptic integration. Agent, action, patient, adjectives and adverbs might need separate compartments that contributed differently to the threshold of spike propagation if a cell needed to draw a logical inference that depended in different ways on each of the components. Equally, the intuition of neurobiologists is that this is highly implausible given what we know of the rules of integration. Until recently, the chance of the dynamics of integration in a single dendritic tree having any degree of complexity seemed rather remote. Data on post-synaptic integration suggested that if it was not a linear summative integrate and fire situation it was not so very different. The recent paper from Hausser’s group may change that but it is not clear to what level of complexity. It does at least suggest that the order of occurrence of individual synaptic potentials may be important to the way a spike propagates. The options would seem to be to find a way of avoiding this requirement for ‘semantic complexity’ in dendritic subdomains or a way of making it reasonably plausible.
There appears to be a way to avoid the requirement, but one with rather odd implications for self-knowledge. For minute-keeping cells as defined above it seems that the way inputs are arranged in a dendritic tree is of no importance – the cell either will or will not induce re-firing of all feeder cells. If these cells ‘host ideas’ it would seem they could do so with the arrangement of the elements giving the meaning being of any importance.
This might seem to be at odds with an intuition that signals coming in to a dendritic tree would have to be arranged in the ‘right way’ to mean redness and roundness for meaning a tomato. It is often suggested that some sort of analogue relation to the outside world would be needed. However, if the signals are operating in high-level propositional space a bit like language I think this suggestion may be simplistic. It is very difficult to rid oneself of a ‘pictorial’ idea of meaning. More importantly, whatever way signals might mean something to a cell ought to be, effectively, a Wittgensteinian private language for that cell that can be completely arbitrary because the cell responds by contributing in the language of a new propositional space. We seem to be in the ‘any symbol can mean anything’ situation with the added insult that to ask why it means what it does mean may be circular. I think Wittgenstein is right to say that to ask if my red is the same as your red is meaningless. So the constraint may be much less than it would at first appear.
I think this is an area absolutely on the limit of what it is legitimate to talk about scientifically but I disagree with Wittgenstein to the extent that he thought it was off limits. If nothing else, the pattern of signals involved in meaning-to must presumably have enough degrees of freedom to cover the range of possibilities to be encoded. I am fairly sure that one could go further but I cannot raise a good argument as yet. I also suspect that spatial relations of inputs at least for truth evaluating cells might need to obey some rules that would somehow complement the mathematics of how the concepts involved were originally inferred so that a comparison had some grounding. However, I have suggested that these cells would have a specialised architecture and probably not be those that ‘host our experiences’.
That notwithstanding, maybe the way grammatical relations get laid out in truth-evaluating cell dendrites provides the constraint that ensures meaning-to is useful and maybe the patterns involved are of some relevance to inputs to other cells too. Suppose that each of two dendritic branches in a truth-evaluating cell integrates post-synaptic potentials in a cumulative way as indicated by:
X = a + f(b + f’(c + f’’(d + f’’’(e ………))))
The integration in the other branch might take the same overall form but with a compressed number of function levels, or a route through a different set of functions. It would then seem to be likely that the usefulness of comparing two such integrations would depend on some complementarity between the mathematics involved and the mathematics of the differentiations involved in the inferential steps that gave rise to the concepts in the first place. If so, then the inputs into a truth-evaluating cell would probably need to be laid out in a way that reflected logical roles in some detail.
An apparent weakness of using a comparison with this sort of complex structure for assessing truth would be that, given that the mathematics of integration are likely to be reasonably simple, a wide range of unrelated propositions might be seen to match because any digression from matching at one function level could be compensated for by another mismatch at another level. Thus approval of ‘my daughter is my brother-in law’s niece’ might entail for the same cell approval of ‘my daughter is a Dutchman’s uncle’.
The way to get around this, presumably, would be to have a million truth-evaluating cells working in parallel, each checking matching by a slightly different formula. This would be in keeping with the reasonable assumption that we are not suggesting that a single cell is involved at any stage of the process; there are likely to be at least thousands of cells to which a set of signals are sent via axonal branches at any stage. If this is the case there may be a reason for thinking that truth really is encoded as zero, and perhaps signified as an absence of firing, since the simplest way to signify truth would seem to be that none of the evaluative procedures generated a positive (error) signal. This might be in keeping with the sense in which truth can appear ‘redundant’ or ‘silent’, as where ‘It is true that P’ can collapse to ‘P’. Only falsity is salient.
A model of this sort implies that there is no single locus of evaluation that might be considered incorrigible. In fact we might expect the final judgment to be based on a statistical rule of fewer than a certain number of signals from many computations satisfying a threshold indicating a negligible chance of error. Moreover, such a judgment would be based on inputs to a system of truth-evaluating cells rather than proposed minute-keeping cells ‘hosting ideas’. This has further implications for undermining theories of incorrigibility of reports of contents of thought, both at a technical and a deeper epistemological level. To know that a report of our thoughts is reliable we must have machinery for checking the truth of reports. Yet this machinery may not be the machinery that hosts the contents reported as thoughts. It might turn out that we have to define it as such, but this is far from clear. Current discussion of self-knowledge seems to pay little attention to how it actually work in terms of biophysics in this way.
While these issues may seem troublesome for philosophical dogma they are perhaps not surprising in empirical terms. The uncertainty about the truth of evaluating what is in experience may actually fit quite well with the fact that we are very unsure just how much detail coexists in our experienced ideas.
In addition to minute-keeping and truth-evaluating cells I need to consider the possible contribution to ‘hosting of experience’ of what I have described above as cells involved in ‘thought-crunching’ – of generation of new inferences from the content of current thought and, presumably, other information available from other sources. As for truth-evaluating cells it would seem that such cells would have the content of current thought as only one part of their input, either to be summated with, or perhaps as for truth-evaluating cells, subtracted from other material. The same sort of arguments for truth-evaluating cells given above would seem to apply.
This discussion may seem to leave things very much up in the air in terms of how the detail of content of ideas is reflected in detailed rules for spatial relations between individual synapses encoding different sorts of content. There is certainly a lot more to know. However, I would not discount the possibility that in some way the burden of reflecting relations within content in rule-based relations between synapses may be spread amongst the many cells that receive the content. If the mechanism for evaluating the reliability of our reports of content is as indirect as suggested then we may not be entitled to think that we need any one cell to have the entire content of an idea laid out in a way that reflects computational rules. On the other hand we may be entitled to think that at least some of the cells receiving the content serve this function for each relation within the content.
In summary, facing up to the need to find a biophysical basis for meaning-to may raise more questions than it answers. Even for there to be such a thing as reference or backward-looking meaning arising in an inferential system I think we have to accept that this entails the generation of something which signifies, and therefore signifies to something. It is not the third party scientist who is being signified to. At a neurobiological level we are below the level of meaning to persons and have to be dealing with meaning to cells. However, it is far from clear which cells we should be thinking of. There is a suggestion that the cells whose input content is most likely to reflect what we describe as our meaningful ideas are not to be equated with ‘agents’ of behavioural response. So the Cartesian concept of a central controlling soul with free will is not what we are looking for. On the other hand, this content seems to be what matters to us.
I have said little about the phenomenal side of meaning-to. Many people are puzzled by the idea that a group of electrical signals could ‘mean to me’ that I think I would rather have the soup than the salad or that my dining partner’s words should ‘mean to me’ that they feel the same way, whether we are dealing with one cell or a network. Descartes pointed out that this is not a problem solved by inventing ‘maps’ or ‘screens’ in the brain. He was not guilty of the homunculus regress often blamed on him. It is just a brute fact that our world is phenomenal and it should not come as any surprise that phenomena should be determined proximally by one sort of sign rather than another. That electrical potentials are not like soup is not the point, because soup itself is not ‘like soup’ in a relevant sense. Whatever meaning is to in a brain it must be something that can interpret signs as ‘states of the world’. That interpretation should not be confused with inference, since the inference has been handled by prior computations. What I think may confuse further in a linguistic context is that this meaning may not strictly be propositional because it is not an increment of information that might or might not be true. Rather it is presented as a current scenario. It is perhaps not surprising that so many people have opted to eliminate meaning-to from their theoretical frameworks. But I think it is to be recognized and explained, in all its perversity.
What more is involved in language use?
Why do other great apes fail to learn language, despite our attempts to teach them? There appears to be something missing from the non-human primate brain that makes it impossible for it to use language in the way we do.
Explanations for the evolution of language tend to fall into two types. The first sees language in terms of a solution to some putative need arising within the survival strategy of a pre-linguistic species. Such theories draw on the social nature of great apes, grooming and even the need to ‘put the baby down’. In my view such approaches are weak. We do not have to find some special ‘need’ for language in primates. I suspect that if the machinery that allows language had arisen in any of a wide variety of vertebrate species it would have conferred enough survival advantage to become established. Feathers did not evolve because of a particular need for a particular species to fly. What may, however, be relevant is that having language ability may come at a cost, such as slow behavioral development, that might be too great in many contexts.
The second approach to the evolution of language is to accept that it arose from a random change that might be expected to confer advantages, not offset by more potent drawbacks, and to focus on what that change was, in terms of how the brain handled incoming information. This is, I think, the more useful approach. On the basis of the model presented so far it seems likely that we are looking for populations of cells in the brain that are programmed by the way their connections develop in early life to subject signals to a new form of differentiation in space, metric time or sequence that allows a new type of inference to be drawn.
Applying Ockham’s razor it would be good if we could identify a change in differentiation that would help explain a range of other mental capacities that seem to be more or less unique to humans (with the caveat that it would be implausible, and over-parsimonious, to try to explain everything with just one mutation). Suggestions for these include arithmetic, music, a sense of ordered, longer term past and potential future, a sense of self-identity and theory of mind. Language itself involves producing and interpreting chains of events. It is also linked to the act of pointing and draws on mimicry.
My proposal is that most or all of these features depend, directly or indirectly, on one particular form of differentiation, a differentiation on the basis of pure sequence, in line with the view of Tulving (2002).
The suggestion is that up until the point of evolution of the human brain all differentiations in one way or another remain tied to metric space and time. Sensory signals may be differentiated into stable patterns of disposition at a particular place, as in objects, or indeed, places. They may also be differentiated into actions – individual instances of operation of dispositions. There may be distinctions between types and tokens of dispositional pattern. What may not occur is differentiation into instances of operation of clusters of dispositions, classified only as being simultaneous, with the referent concept being an ‘event in time’. The distinction may not be black and white. Pre-linguistic animals may have some limited capacity to derive such differentiation but the suggestion is that human language became possible with a new cellular connectivity pattern that made differentiation by sequence much more powerful. It may take over as the default format for ‘ideas’ so that spatial scenarios become event scenarios.
Arithmetic falls neatly into this proposal, since the basis of number is now understood to be sequence and the difference between children learning numbers and apes is said to be that only children understand what ‘one more’ or ‘the next one’ means. Music is all about sequence. A sense of longer-term past and future is a matter of sequence.
A sense of self-identity and theory of mind may not immediately appear to relate to sequence but there is a fundamental link that may explain why differentiation in sequence is such a major step in neurological function. The distinction between internal ‘mental’ events and external ‘physical’ events has generated a lot of heat in philosophy. Mental events are often seen as being known through phenomenal character, yet external events are known through phenomenal features too. The true distinction, as I understand it, is that events become unequivocally mental at the point where the brain no longer has any means for cross-referencing in terms of spatial location. We can tell that our view of the world is partly dependent on ‘physical’ events in the eye because we can make the view shift by pressing on the eye. But beyond this the events within the brain can only be categorized in sequence. They cannot even be categorized in metric time because judgments of metric time depend on spatial changes that occur at reliable rates. Thus, mental events are known purely sequentially, even if they give rise to ideas about spatial aspects of the world.
There is a further connection between mental events and sequence that may be even more significant. Individual relations of sequence, in terms of cause-effect pairs, can be identified from the immediate environment. Longer chains of sequence, however, can only be inferred from differentiation of internal events, because, unlike space, you cannot retrace your steps in sequential time in the outside world. Such a differentiation requires two things. It requires an ‘action-replay’ facility, which can be provided by a mordant loop-type system. It also requires allocating an instance of that replay to a point in a chain of sequential internal events that is itself treated, like sensory data, in a way that allows inferences about relations to be drawn.
The novelty of this process might appear to be in the chaining. However, chaining of routines occurs extensively in motor functions like running and grid cells suggest that chaining is also made use of in monitoring a sense of position. Since so much of perception is linked to differentials with expectations from motor activity it seems likely that chaining of sequences plays an important part in a good deal of perception. However, this is probably mostly at a low level of inference and perhaps always related in some way to positions in space and the objects that might take those positions. The novelty would be tracking chains of sequence of higher-level inferences regardless of their association with spatial location. An interesting implication of this is that since the pre-linguistic brain is likely to have had massive capacity for tracking chains for spatial purposes a small shift in connection structure that allowed tracking of pure sequence might be able to call upon formidable resources. This might explain why musicians readily memorise sequences of thousands of notes and epic poets might have recited thousands of words of verse, for no obvious survival advantage, when a chimp has trouble with two. Perhaps the pre-linguistic state entails a highly sophisticated mathematical facility, but no mechanism for applying this facility in abstraction from the spatial context of movement.
It might seem that any action conceived in the past would need an ability to track pure temporal sequence. However, there may be an argument for there being no ability in pre-linguistic animals other than that of being able to associate with ‘the past: unspecified’, and some pattern of disposition, usually in the form of an object or animal that has been ‘updated’ accordingly. I doubt that there is a faculty to differentiate in a sequence this updating from other updatings that might flip back and forth between two states. Although pre-linguistic animals can grasp single instances of cause and effect relation, or sequence within a scenario it seems likely that they cannot attribute such scenarios to a chain that they can subsequently retrieve.
The link between internal mental events, or thoughts, and sequence might seem still to be tenuous. However, if the differentiation is being made on material at the highest level of inference, involving the linking of word-like concepts, then the sequence will be one of what is in the mind rather than in the world. It is unlikely that there would be any advantage in logging in sequence all low-level sensory information. Since higher-level material is influenced by memories and focus of attention it will be, by nature a differentiation that picks out the story of the mind rather than that of the world. This need not be explicit, and in infancy it seems likely that no distinction is made between sequence in the world and sequence in the mind. However, it will become explicit once regular interaction with others has made it apparent that the sequence is not the same for everyone. At this point ‘theory of mind’ comes in terms of inferring that other people’s sequences are logged differently from one’s own.
In this context, given that many of our basic sensory pathways make use of ‘predictive coding’ that interprets sensory input in terms of expectations in relation to motor routines like shifts in head and body position, it may be that mimicry is not so different from learning to calibrate the world by repeating ones own actions under different conditions. Mimicry is clearly a major feature of language use but although mimicry may be enhanced in Homo sapiens beyond that in other primates and there may be a genetic shift underpinning this I am not sure that a step change in brain function needs to be invoked.
I will not speculate in detail on the specific machinery that might allow inference based on pure sequence but I think the suggestions made by Buzsaki and Moser (2013) about a shift in hippocampal and entorhinal cortex function are plausible. What seems to be needed is something like the entorhinal grid cell but calibrated in terms of pure sequence rather than space. It may be of note that primates differ from rats in terms of theta precession – possibly suggesting that the unique machinery in the human may build on an earlier facilitating shift in primates. The development of saccadic eye movements in primates might even be relevant here as something that favoured later differentiation of sensory material into a sequence of events. Arguably our view of the world is very much built up out of temporal ‘pearls’ in a time frame not too different from that of saccades and, indeed, that of words in sentences. Buzsaki makes the point that the pre-existing spatial navigational machinery may, by iterative use of a hierarchical time relation provide a way not only to log sequence but to ‘chunk’ it into progressively larger units. So it may not be surprising that for distance we have inches, feet and miles and for time minutes days and years and even numbers are chunked into units tens and hundreds
An interesting aspect of an ability to differentiate input in terms of pure sequence is that in some ways it does the opposite of what sensory inference mechanisms tend to do. Most of these mechanisms appear to extract invariant patterns in world dynamics over time – reliable patterns of disposition perhaps. Logging events is more a matter of noting inconsistencies. In mathematical terms it may be simpler, since extracting invariant features of, for instance, faces, so that they can be recognized from all angles, would be expected to require sophisticated computation. Logging in sequence does not require quantitative co-transformation in three dimensions, indeed it is essentially a qualitative task. What is different about logging in pure sequence is that it has to be based on a reliable internal dispositional property that is some sort of chaining of signals that can be re-run to retrieve the sequence relation later.
So the suggestion is that whereas other animals have concept cells that handle things, actions and places we have concept cells that also handle events, perhaps with events taking precedence over places. The implication is then that the function of at least some minute-keeping cells with their ‘whiteboard loops’ may shift to look more like that of concept or gnostic cells with mordant loops. This might involve a change in plasticity that allowed a shift from a temporary to a long-term function or a new sub-population of gnostic cells with a different connection pattern. Events become concept referents. What is then logged is not just an updating of spatially defined concepts but the event of updating itself. (This would seem to be relevant to dreaming. It is said that dogs dream but maybe they do not dream sequences of events.)
This shift in the referents of concepts will presumably make language useful in a way that for other animals it would not be. There is relatively little point in having signs for assigning roles to things or places not currently in view if there is no sense of structured past and future. Alarm calls for hidden predators would be relevant. Stereotyped calls relating to other places like the calls of flocks going to roost would be relevant. But not much more. Moreover, to indicate a scenario requires the signing to have significant structural complexity and, if not just a spatial scene, dynamic direction. Pictures might work but strings of sounds or gestures clearly have major advantages.
Within this framework the human act of pointing looks to have a specific role but it is not quite clear how it fits in. To point might be to indicate ‘that is my scenario’ but the irony is that language seems to be the very mechanism to make pointing redundant. I suspect that pointing is a means of ‘re-spatialising’ in the context of an utterance that would otherwise not necessarily be taken to refer to the here and now. Perhaps it is significant that pointing is a prominent feature of infants only used later for specialized purposes. Perhaps it mitigates what may be a demanding task of calibrating a brain newly programmed to run on reference to event scenarios in terms of spatial scenarios – a way of harmonising pre-human machinery with a human event-based format. It might be argued that pointing would be of more use to non-linguistic animals than to us but the catch may be that without the concept of episode and thus no concept of internal events and no theory of mind, at least in a conspecific individual interpreting the pointing, it never got selected for as useful.
Language production and parsing
Much has been made of the recursive structure of language. It might seem obvious that this would be linked to an ability to differentiate sensory data in terms of sequence and it seems reasonable to suggest that the differentiation mechanism, however it works, would be ideal, and perhaps necessary, for decoding linguistic strings. However, I suspect this is a relatively coincidental factor. Language might have worked in pictorial form. I doubt that the theoretically infinite recursion claimed for language is of great interest since the key features of language seem well represented in strings of two or three words and strings longer than thirty words are barely usable.
I also suspect that we have a language production system that allows for quite a bit of role shuffling during sentence production. Thus I am aware of producing sentences like ‘Frequently, I find myself, half way through a sentence, actively hunting for words … that will both come out in grammatical sequence … and complete the desired meaning of the sentence.’ I am pretty sure that roles of concepts have had time to be switched around during this, both for me and for a parser.
What I think is an interesting possibility is that language plugs more or less directly into the triage and label re-allocation system I considered in the context of a non-linguistic scenario. If in constructing a scenario, whether spatial or event-like, the brain needs to go through a routine of re-sorting signals triaged both according to salience and dynamic role at some point in their history, language use may by default adopt the rules of this routine. This would make sense in terms of the universal rules for spoken languages suggested by Kempson and colleagues in the theory of Dynamic Syntax in which words are allocated to roles as they are heard but in such a way that these are only partially defined until enough has been heard to indicate how all elements fit together with pragmatic context. In this view ‘Universal Grammar’ would be something not unique to humans but merely uniquely accessible for language use. On the other hand it may be that an ability to differentiate events in terms of pure sequence brings with it specific changes in the rules of role allocation.
My current view is that the practicalities of parsing and uttering of sentences do not pose major mysteries. They may be seen in much the same way as reaching for an apple or tying a shoelace. Complex motor routines will be embedded in memory in association with activation of certain gnostic cells for the generation of words. Close interlinking between parsing and production mechanism in terms of packaging complex motor routines and linking them to concept cells or lexical item cells would seem to be in keeping with what is known about the link between visual perception and efferent copies of actions involved in tracking objects. Learning how to speak might not be so very different from learning how to eat with a knife and fork. What is more remarkable is having thoughts related to episodes other than now of a sort we might want to convey to other people.
Nevertheless, once a group of our ancestors cottoned on to the usefulness of stringing words together, even if the evolutionary time available is short, it seems reasonable to suggest that there would be extreme selective pressure in favour of individuals who were particularly slick at talking. Even without new mutations one would expect any existing variations in genetic encoding of cellular connections in the brain to allow a shift in functional efficiency for language to occur under selection pressure. This might be sexual selection or a more general ‘social selection’ whereby skilled wordsmiths might earn a fast-track meal ticket in exchange for services to the community. Ability to discriminate subtle intonational differences or timing might be useful if speech was initially based largely on pitch and rhythm. Even repertoire of mouth and tongue movement might be under selection pressure. So it is reasonable to suggest that a number of further evolutionary details in terms of neural control of production and parsing may underlie the current faculty of language.
An obvious spin off from this would be musical talent. I think it is hard to explain the ability of Alfred Brendel to play Beethoven on the basis of natural selection for some social bonding function that could arguably be achieved with some chimpanzee hoots. I think it more likely that musical ability of that sort reflects selective pressures relating to skills in an early language that was closer to music than it is now. I also suspect that the extraordinary complexity of the skill may simply reflect a new machinery for differentiation in sequence tapping in to the chaining routines used by the motor system where demands for quantitative accuracy and information capacity were always much greater.
The central theme here is that the neurobiological grain that Poeppel and Embick needed to match up to meaning and language is the neuronal grain. It is the only grain that can provide the biophysical substrate needed. Processes can be described at a coarser grain as well, but only if grounded in a role for meaning, both as output and as input, at the cellular level. There need be no ontological mismatch between language and neurology if sufficiently sophisticated causal relational accounts for backward-looking meaning and meaning-to are considered.
Barlow HB. (1972) Single units and sensation: a neuron doctrine for perceptual psychology? Perception 1: 371—394.
Barlow HB. (2009) Grandmother cells. in The Cognitive Neurosciences, 4th Edn, ed Michael Gazzaniga. Cambridge, MA: MIT Press. p309–320.
Buzsaki G and Moser EI. (2013) Memory, navigation and theta rhythm in the hippocampal-entorhinal system. Nature Neuroscience 16, 2, 130-138.
Dretske F. (1986) Misrepresentation. In R Bogdan, ed. Belief: Form, Content and Function Oxford University Press.
Fodor, J. A.,& Pylyshyn, Z. (1988). Connectionism and cognitive architecture: A critical analysis. Cognition, 28: 3-71.
Ladyman J and Ross D. (2007) Everything Must Go Oxford University Press.
Millikan, R. (1984). Language, Thought and Other Biological Categories. Cambridge MA: MIT Press.
Poeppel D and Embick D. (2005) Defining the relation between linguistics and neuroscience. In A. Cutler ed. Twenty-first century psycholinguistics: Four cornerstones, Lawrence Erlbaum.
Prinz J. (2012) The Conscious Brain. Oxford University Press.
Purves D and Lotto B. (2003) Why We See What We Do. Sinauer Associates.
Shadlen MN and Movshon JA (1999) Synchrony unbound: a critical evaluation of the temporal binding hypothesis. Neuron 24: 67-77.
Tiesinga PH and Sejnowski TJ. (2010) Mechanisms for phase shifting in cortical networks and their role in communication through coherence. Frontiers in Neuroscience, 4, Article 196.
Trehub A. (1991) The Cognitive Brain MIT Press.
von der Malsburg C. (1981/1994). The correlation theory of brain function. In Models of Neural Networks II Domany, van Hemmen, Schulten, eds. Springer.
Shoemaker S. (1980) ‘Causality and Properties’, in P. van Inwagen (ed.), Time and Cause: Essays Presented to Richard Taylor, Dordrecht: Reidel, 109–135.
Smith SL, Smith IT, Branco T and Häusser M. (2013) Dendritic spikes enhance stimulus selectivity in cortical neurons in vivo. Nature, ePub October 2013.
Tulving E (2002) Episodic memory: from mind to brain Ann Rev Psych 53, 1-25.
 ’Ideas’ may be considered ontologically suspect but in my view that is because they are inputs. Inputs tend not to figure in scientific terminology because their role in a causal chain is contingent on what they are input to and so they are not describable in the generalisable dispositional terms science needs to make predictions. They are a real part of causal dynamics but the causation they mediate has to be described in alternative terms that makes them redundant as explanatory tools except in the context of meaning.
 I see the concept of ‘referent’ as an idealization, like a God’s eye view’, that has no ultimate validity. To misquote Fodor ‘whatever reference is, it is really something else’. What matters is what subsets of dynamic possibilities an internal signal correlates with reliably enough to be useful.
 All physics is about relating instances of operation of dynamic laws to experiences (observations). All physics is local in the sense that the closest ascertainable relation of dynamics to experience occurs in one particular domain in space and time, not several. If we think that the relation of neural events to experienced meaning is physical, and at least potentially ascertainable (subject to scientific study), then it must be local rather than distributed over separate domains. The suggestion that experience might relate to a ‘point in a state space’ (e.g. Churchland), based on a collection of separate events, is not physics, - which is somewhat ironic since such a view is often espoused by ‘hard-line physicalists’. Physics does not relate experience to statically conceived ‘states’ but only to dynamics, which is why a naturalistic analysis of language needs to be dynamic.