Skip to content

Latest commit

 

History

History
107 lines (56 loc) · 12.5 KB

Deep.md

File metadata and controls

107 lines (56 loc) · 12.5 KB

Deep Predictive Learning

See Github Discussion for more info on how this works computationally, and on the examples/deep_* models. Implementation is in: deep_{net.go, layers.go, paths.go}.

The Deep variant of Axon performs predictive learning by attempting to predict the activation states over the Pulvinar nucleus of the thalamus (in posterior sensory cortex), which are strongly driven phasically every 100 msec by deep layer 5 intrinsic bursting (5IB) neurons that have strong focal (essentially 1-to-1) connections onto the Pulvinar Thalamic Relay Cell (TRC) neurons. The predictions are generated by layer 6 corticothalamic (CT) neurons, which provide numerous weaker pathways to these same TRC neurons. See OReilly et al., 2021 for the model, and Sherman & Guillery (2006) for details on circuitry.

Computationally, it is important for the CT neurons to reflect the prior burst activation within their home cortical microcolumn, instead of the current superficial layer activation, so that the system is forced to make a genuine prediction instead of just copying the current state. This is achieved using a CTCtxt pathway, which operates much like a simple recurrent network (SRN) context layer (e.g., Elman, 1990).

This same corticothalamic circuitry is also important for broader inhibitory competition among cortical areas that cannot practically interact directly in cortex, given the long physical distances. The thalamic reticular nucleus (TRN) integrates excitatory inputs from the CT and TRC neurons, and projects pooled inhibition across multiple spatial scales back to the TRC neurons. These TRC neurons then project back primarily into layer 4 (and more weakly into other layers) and thus convey the attentionally modulated predictive activation back into cortex.

Computationally, it makes sense that attention and prediction are linked: you only predict the subset of information that you're attending to (otherwise it is too overwhelming to predict everything), and prediction helps to focus the attentional spotlight in anticipation of what will happen next, and according to the regularities that predictive learning has discovered (e.g., coherently moving objects).

However, the attentional demands and prediction demands are in conflict, and various attempts to integrate the two functions have been suboptimal. Furthermore, there are two complete maps of the ventral visual pathway in the Pulvinar (VP1, VP2; Shipp, 2003), and also a distinction between Matrix and Core pathways, so there is plenty of biological basis for multiple different connectivity patterns -- these separate pathways are implemented in this version.

For prediction, many layers of CT need to collaborate to generate more accurate predictions over pulvinar TRC layers, e.g., for V1. Furthermore, the more detailed, high variance activity of the lower layers is important for driving sufficiently variable error signals across all layers. In addition, prediction often requires broader connectivity to anticipate larger movements, etc. By contrast, the attentional functions require more focal topographic connectivity and each layer needs its own distinct pulvinar layers, with closed loops. It is not clear that a driver input makes sense in this context, whereas it is essential for the predictive component. Overall these distinctions fit well with the Matrix (attentional) vs. Core (predictive) features.

Predictive Circuit

The predictive pulvinar TRC is created and associated with the driver layer, and it has a one-to-one geometry with that layer. Many other CT layers can project to this TRC to try to predict what the driver layer's activity will be.

 V1Super -> V2 --Ctxt--> CT
   |        ^             |\
 Burst      |   (pool     | v
   |        |    loop)    | TRN
   v        |             | /
 Pulv <-------------o (inhib)

This package has 3 primary specialized Layer types:

  • SuperLayer: implements the superficial layer 2-3 neurons, which function just like standard axon.Layer neurons, and always represent the current state of things. They learn continuously from predictive learning error signals, are widely interconnected with other cortical areas, and form the basis for the learned representations in other layers. As a computational simplification, they can also directly compute the Burst activation signal that reflects the deep layer 5IB bursting activation, via thresholding of the superficial layer activations (Bursting is thought to have a higher threshold). Activity is represented by the CaSpkP value -- Act is used only for display purposes!

  • CTLayer: implements the layer 6 regular spiking CT corticothalamic neurons that project into the thalamus. They receive the Burst activation via a CTCtxtPath pathway type, and integrate that in the CtxtGe value, which is added to other excitatory conductance inputs to drive the overall activation of these neurons. Due to the bursting nature of the Burst inputs, this causes these CT layer neurons to reflect what the superficial layers encoded on the previous timestep -- thus they represent a temporally delayed context state.

CTLayer can send Context via self pathways to reflect the extensive deep-to-deep lateral connectivity that provides more extensive temporal context information.

Furthermore, the active recurrent connections within the CT layer support active maintenance via NMDA conductances. This active maintenance can sustain activity over relatively long time windows, beyond the confines of a single alpha or theta cycle.

  • PulvLayer: implement the Pulvinar TRC neurons, upon which the prediction generated by CTLayer pathways is projected in the minus phase. This is computed via standard current-time pathways that integrate into standard Ge excitatory input in TRC neurons. The 5IB Burst-driven plus-phase "outcome" activation state is driven by direct access to the corresponding driver SuperLayer (not via standard pathway mechanisms).

In addition, there are three optional layer types that account for the deep layer 5IB neurons, also known as pyramidal tract PT neurons:

  • PTMaintLayer: implements a subset of PT neurons that exhibit robust active maintenance, typically gated by basal ganglia (BG) disinhibition of a corresponding thalamic layer (e.g., via the pcore framework). This gating can be accomplished by configuring a ModulatoryG pathway from the thalamus layer, which contributes extra excitation according to the Act.Dend.ModGain scaling parameter.

  • PTPredLayer: implements a subset of PT neurons that is like CTLayer in contributing to predictive learning over the thalamus, but receives input from the PTMaintLayer and is thus only active during periods of active maintenance. This layer provides the primary input to VSPatch US-timing prediction layers in the Rubicon framework, and other layers that require predictive dynamic inputs.

Two different parameter regimes

There are two primary modes of behavior for the CT layers: single-step copy and multi-step temporal integration, each of which requires different parmeterization:

  • Single-step copy requires NMDA, GABAB Gbar = .15, Tau = 100, (i.e. std defaults) and CT.Decay = 0, with one-to-one pathway from Super, and no CT self connections. See examples/deep_move for a working example.

  • Temporal integration requires NMDA, GABAB Gbar = .3, Tau = 300, CT.Decay = 50, with self connections of both CTCtxtPath and standard that support NMDA active maintenance. See examples/deep_fsa and examples/deep_move for working examples.

Timing

The Burst value is computed in SuperLayer during the plus phase, and this is continuously accessed by TRCLayer neurons to drive plus-phase outcome states.

At the end of the plus phase, CTCtxt pathways convey the Burst signal from Super to CTLayer neurons, where it is integrated into the Ctxt value representing the temporally delayed context information.

TRN Attention and Learning

NOTE: this aspect has not been recently updated and is out of date

The basic anatomical facts of the TRN strongly constrain its role in attentional modulation. With the exception of inhibitory pathways from the GPi / SNr (BG output nuclei), it exclusively receives excitatory inputs from CT pathways, and a weaker excitatory feedback pathway from the TRC neurons that they in turn send GABA inhibition to. Thus, their main function appears to be providing pooled feedback inhibition to the TRC, with various levels of pooling on the input side and on the diffusion on the output side. Computationally, this pooling seems ideally situated to enable inhibitory competition to operate across multiple different scales.

Given the pool-level organization of the CT -> TRC -> Cortex loops, the pool should be the finest grain of this competition. Thus, a contribution of the TRN is supporting layer-level inhibition across pools -- but this is already implemented with the layer level inhibition in standard Axon. Critically, if we assume that inhibition is generally hierarchically organized, then the broader level of inhibition would be at the between-layer level. Thus, the TRN implementation just supports this broadest level of inhibition, providing a visual representation of the layers and their respective inhibition levels.

In addition, the Pulvinar layer itself supports a gaussian topographic level of inhibition among pools, that represents a finer grained inhibition that would be provided by the TRN.

Perhaps the most important contribution that the TRC / TRN can provide is a learning modulation at the pool level, as a function of inhibition.

Compounding: Getting the Good without too much Lock-In

It is relatively easy to make something that locks in a given attentional pattern, but a problem arises when you then need to change things in response to new inputs -- often the network suffers from too much attentional lock-in...

Reynolds & Heeger (2009)

The basic phenomena behind this model are well captured by the FFFB inhibitory dynamics, but FFFB in general retains proportional activity as a function of excitatory drive. However, the key distinction between "contrast gain" and "response gain" is not captured by FFFB. In particular when the attentional spotlight is wide, then an additional amount of inhibition is generated relative to a narrow attentional spotlight.

A different way of thinking about this is in terms of nonlinear inhibition of the same type that is implicated in popout effects and is well documented empirically (Murphy & Miller, 2009; etc). When there is a lot of excitatory drive (for the same features) within a proximal region, above a threshold level, then an additional amount of inhibition is added. The Inhib.Topo settings compute this topographic inhibition, and the examples/attn_trn example shows how it drives the RH09 effects.

Folded Feedback (Grossberg, 1999)

Grossberg (1999) emphasized that it can be beneficial for attention to modulate the inputs to a given area, so it gets "folded" into the input stream. Another way of thinking about this is that it is more effective to block a river further upstream, before further "compounding" effects might set in, rather than waiting until everything has piled in and you have to push against a torrent. This is achieved by modulating the layer 4 inputs to an area, which happens by modulating forward pathways.

New impl with separate attn vs. prediction

  • Start with rate code multiplicative factor computation

  • Keep it simple in terms of additional layers and complexity

  • TRCa = attentional TRC's, one unit per pool, integrate gaussian over neighboring pools as netin, TRN = one unit per layer, integrates total over pools for layer, sends back to drive normalized act of TRCa, which then is multiplicative on Ge into 2/3.

References