US9146546B2 - Systems and apparatus for implementing task-specific learning using spiking neurons - Google Patents
Systems and apparatus for implementing task-specific learning using spiking neurons Download PDFInfo
- Publication number
- US9146546B2 US9146546B2 US13/487,533 US201213487533A US9146546B2 US 9146546 B2 US9146546 B2 US 9146546B2 US 201213487533 A US201213487533 A US 201213487533A US 9146546 B2 US9146546 B2 US 9146546B2
- Authority
- US
- United States
- Prior art keywords
- learning
- state
- signal
- reinforcement
- parameter
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Expired - Fee Related, expires
Links
Images
Classifications
-
- G—PHYSICS
- G05—CONTROLLING; REGULATING
- G05B—CONTROL OR REGULATING SYSTEMS IN GENERAL; FUNCTIONAL ELEMENTS OF SUCH SYSTEMS; MONITORING OR TESTING ARRANGEMENTS FOR SUCH SYSTEMS OR ELEMENTS
- G05B13/00—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion
- G05B13/02—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric
- G05B13/0265—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion
- G05B13/027—Adaptive control systems, i.e. systems automatically adjusting themselves to have a performance which is optimum according to some preassigned criterion electric the criterion being a learning criterion using neural networks only
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/049—Temporal neural networks, e.g. delay elements, oscillating neurons or pulsed inputs
Definitions
- the present disclosure relates to implementing generalized learning rules in stochastic spiking neuron systems.
- FIG. 1 One typical configuration of an adaptive system of prior art is shown in FIG. 1 .
- the system 100 may be capable of changing or “learning” its internal parameters based on the input 102 , output 104 signals, and/or an external influence 106 .
- the system 100 may be commonly described using a function 110 that depends (including probabilistic dependence) on the history of inputs and outputs of the system and/or on some external signal r that is related to the inputs and outputs.
- the function F(x,y,r) may be referred to as a “performance function”.
- the purpose of adaptation (or learning) may be to optimize the input-output transformation according to some criteria, where learning is described as minimization of an average value of the performance function F.
- Supervised learning may be the machine learning task of inferring a function from supervised (labeled) training data.
- Reinforcement learning may refer to an area of machine learning concerned with how an agent ought to take actions in an environment so as to maximize some notion of reward (e.g., immediate or cumulative).
- Unsupervised learning may refer to the problem of trying to find hidden structure in unlabeled data. Because the examples given to the learner are unlabeled, there is no external signal to evaluate a potential solution.
- the learning rules may need to be modified to suit the new task.
- the boldface variables and symbols with arrow superscripts denote vector quantities, unless specified otherwise.
- Complex control applications such as for example, autonomous robot navigation, robotic object manipulation, and/or other applications may require simultaneous implementation of a broad range of learning tasks.
- Such tasks may include visual recognition of surroundings, motion control, object (face) recognition, object manipulation, and/or other tasks.
- existing implementations may rely on a partitioning approach, where individual tasks are implemented using separate controllers, each implementing its own learning rule (e.g., supervised, unsupervised, reinforcement).
- the apparatus 120 comprises several blocks 120 , 124 , 130 , each implementing a set of learning rules tailored for the particular task (e.g., motor control, visual recognition, object classification and manipulation, respectively). Some of the blocks (e.g., the signal processing block 130 in FIG. 1A ) may further comprise sub-blocks (e.g., the blocks 132 , 134 ) targeted at different learning tasks. Implementation of the apparatus 120 may have several shortcomings stemming from each block having a task specific implementation of learning rules. By way of example, a recognition task may be implemented using supervised learning while object manipulator tasks may comprise reinforcement learning.
- a single task may require use of more than one rule (e.g., signal processing task for block 130 in FIG. 1A ) thereby necessitating use of two separate sub-blocks (e.g., blocks 132 , 134 ) each implementing different learning rule (e.g., unsupervised learning and supervised learning, respectively).
- two separate sub-blocks e.g., blocks 132 , 134
- different learning rule e.g., unsupervised learning and supervised learning, respectively.
- An artificial neural network may include a mathematical and/or computational model inspired by the structure and/or functional aspects of biological neural networks.
- a neural network comprises a group of artificial neurons (units) that are interconnected by synaptic connections.
- an ANN is an adaptive system that is configured to change its structure (e.g., the connection configuration and/or neuronal states) based on external or internal information that flows through the network during the learning phase.
- a spiking neuronal network may be a special class of ANN, where neurons communicate by sequences of spikes.
- SNN may offer improved performance over conventional technologies in areas which include machine vision, pattern detection and pattern recognition, signal filtering, data segmentation, data compression, data mining, system identification and control, optimization and scheduling, and/or complex mapping.
- Spike generation mechanism may be a discontinuous process (e.g., as illustrated by the input spikes sx(t) 220 , 222 , 224 , 226 , 228 , and output spikes sy(t) 230 , 232 , 234 in FIG. 2 ) and a classical derivative of function F(s(t)) with respect to spike trains sx(t), sy(t) is not defined.
- individual tasks may be performed by a separate network partition that implements a task-specific set of learning rules (e.g., adaptive control, classification, recognition, prediction rules, and/or other rules).
- Learning rules e.g., adaptive control, classification, recognition, prediction rules, and/or other rules.
- Unused portions of individual partitions e.g., motor control when the robotic device is stationary
- processing resources e.g., when the stationary robot is performing face recognition tasks.
- partitioning may prevent dynamic retargeting (e.g., of the motor control task to visual recognition task) of the network partitions.
- a mobile robot controlled by a neural network where the task of the robot is to move in an unknown environment and collect certain resources by the way of trial and error.
- This can be formulated as reinforcement learning tasks, where the network is supposed to maximize the reward signals (e.g., amount of the collected resource). While in general the environment is unknown, there may be possible situations when the human operator can show to the network desired control signal (e.g., for avoiding obstacles) during the ongoing reinforcement learning.
- This may be formulated as a supervised learning task.
- Some existing learning rules for the supervised learning may rely on the gradient of the performance function.
- the gradient for reinforcement learning part may be implemented through the use of the adaptive critic; the gradient for supervised learning may be implemented by taking a difference between the supervisor signal and the actual output of the controller. Introduction of the critic may be unnecessary for solving reinforcement learning tasks, because direct gradient-based reinforcement learning may be used instead. Additional analytic derivation of the learning rules may be needed when the loss function between supervised and actual output signal is redefined.
- analytic determination of a performance function F derivative may require additional operations (often performed manually) for individual new formulated tasks that are not suitable for dynamic switching and reconfiguration of the tasks described before.
- Some of the existing approaches of taking a derivative of a performance function without analytic calculations may include a “brute force” finite difference estimator of the gradient.
- these estimators may be impractical for use with large spiking networks comprising many (typically in excess of hundreds) parameters.
- Derivative-free methods specifically Score Function (SF), also known as Likelihood Ratio (LR) method.
- SF Score Function
- LR Likelihood Ratio
- these methods may sample the value of F(x,y) in different points of parameter space according to some probability distribution.
- the SR and LR methods utilize a derivative of the sampling probability distribution. This process can be considered as an exploration of the parameter space.
- stochastic adaptive apparatuses may be incapable of learning to perform unsupervised tasks while being influenced by additive reinforcement (and vice versa).
- Many presently available adaptive implementations may be task-specific and implement one particular learning rule (e.g., classifier unsupervised learning), and such devices invariably require retargeting (e.g., reprogramming) in order to implement different learning rules.
- presently available methodologies may not be capable of implementing generalized learning, where a combination of different learning rules (e.g., reinforcement, supervised and supervised) are used simultaneously for the same application (e.g., platform motion stabilization), thereby enabling, for example, faster learning convergence, better response to sudden changes, and/or improved overall stability, particularly in the presence of noise.
- spiking neuron networks may be typically expressed in terms of original spike trains instead of their secondary features (e.g., the rate or the latency from the last spike).
- the result is that a spiking neuron operates on spike train space, transforming a vector of spike trains (input spike trains) into single element of that space (output train).
- Dealing with spike trains directly may be a challenging task. Not every spike train can be transformed to another spike train in a continuous manner.
- One common approach is to describe the task in terms of optimization of some function and then use gradient approaches in the parameter space of the spiking neuron.
- gradient methods on discontinuous spaces such as spike trains space are not well developed.
- One approach may involve smoothing the spike trains first.
- spike trains are smoothed with introduction of probabilistic measure on a spike trains space. Describing the spike pattern from a probabilistic point of view may lead to fruitful connections with the huge amount of topics within information theory, machine learning, Bayesian inference, statistical data analysis etc. This approach makes spiking neurons a good candidate to use SF/LR learning methods.
- One technique frequently used when constructing learning rules in a spiking network comprises application of a random exploration process to a spike generation mechanism of a spiking neuron. This is often implemented by introducing a noisy threshold: probability of a spike generation may depend on the difference between neuron's membrane voltage and a threshold value.
- the usage of probabilistic spiking neuron models in order to obtain gradient of the log-likelihood of a spike train with respect to neuron's weights, may comprise an extension of Hebbian learning framework to spiking neurons.
- the use of the log-likelihood gradient of a spike train may be extended to supervised learning.
- information theory framework may be applied to spiking neurons, as for example, when deriving optimal learning rules for unsupervised learning tasks via informational entropy minimization.
- the probability of an output spike train, y, to have spikes at times t_f with no spikes at the other times on a time interval [0, T], given the input spikes, x, may be given by the conditional probability density function p(y
- ⁇ (t) represents an instantaneous probability density (“hazard”) of firing.
- membrane voltage u(t) is the only one state variable (q(t) ⁇ u(t)) that is “responsible” for spike generation through deterministic threshold mechanism.
- a simple spiking model may comprise two state variables where only one of them is compared with a threshold value.
- a single variable e.g., an equivalent of “membrane voltage” of biological neuron
- Such models are often extended to describe stochastic neurons by replacing deterministic threshold with a stochastic threshold.
- d q ⁇ d t V ⁇ ( q ⁇ ) + ⁇ t out ⁇ R ⁇ ( q ⁇ ) ⁇ ⁇ ⁇ ( t - t out ) + G ⁇ ( q ⁇ ) ⁇ I ext ( Eqn . ⁇ 6 )
- I ext is external input to the neuron
- F is the function that defines evolution of the state variables
- G describes the interaction between the input current and the state variables (for example, to model synaptic depletion); and R describes resetting the state variables after the output spikes at t out .
- Eqn. 6 may be expressed as:
- stochastic adaptive apparatuses may be incapable of learning to perform unsupervised tasks while being influenced by additive reinforcement (and vice versa).
- Many presently available adaptive implementations may be task-specific and implement one particular learning rule (e.g., classifier unsupervised learning), and such devices invariably require retargeting (e.g., reprogrammed) in order to implement different learning rules.
- the present disclosure satisfies the foregoing needs by providing, inter alia, apparatus and methods for implementing generalized probabilistic learning configured to handle simultaneously various learning rule combinations.
- the system may comprise a controller apparatus configured to generate output control signal y based at least in part on input signal x, the controller apparatus characterized by a controller state parameter S, and a control parameter w; and a learning apparatus configured to: generate an adjustment signal dw based at least in part on the input signal x, the controller state parameter S, and the output signal y; and provide the adjustment signal dw to the controller apparatus, thereby effecting the learning where the control parameter may be configured in accordance with the task; and the adjustment signal dw may be configured to modify the control parameter based at least in part on the input signal x and the output signal y.
- the output control signal y may comprise a spike train configured based at least in part the adjustment signal dw; and the learning apparatus may comprise a task-specific block, configured independent from the controller state parameter, the task-specific block configured to implement the task-specific learning; and a controller-specific block, configured independent from the task-specific learning; and the task-specific learning may be characterized by a performance function, the performance function configured to effect at least unsupervised learning rule.
- the system may further comprise a teaching interface operably coupled to the learning apparatus and configured to provide a teaching signal; the teaching signal may comprise a desired controller output signal; and the performance function may be further configured to effect a supervised learning rule, based at least in part on the desired controller output signal; and the teaching signal may further comprise a reinforcement spike train associated with current performance of the controller apparatus relative to desired performance; and the performance function maybe further configured to effect a reinforcement learning rule, based at least in part on the reinforcement spike train.
- the current performance may be based at least in part on adjustment of the control parameter from a prior state w 0 to current state wc; the reinforcement may be positive when the current performance may be closer to desired performance of the controller; and the reinforcement may be negative when the current performance may be farther from the desired performance; and the task-specific learning may comprise a hybrid learning rule comprising a combination of the reinforcement, the supervised and the unsupervised learning rules simultaneous with one another.
- the adjustment signal dw may be determined as a product of controller performance function F with a gradient of per-stimulus entropy parameter h, the gradient may be determined with respect to the control parameter w; and per-stimulus entropy parameter h may be configured to characterize dependence of the output signal y on (i) the input signal x; and (ii) the control parameter w; and the per-stimulus entropy parameter may be determined based on a natural logarithm of p(y
- a computer readable apparatus may comprise a storage medium, the storage medium comprising a plurality of instructions to adjust a learning parameter associated with a computerized spiking neuron configured to produce output spike signal y consistent with (i) an input spike signal x, and (ii) a learning task, the instructions configured to, when executed construct time derivative representation of a trace S of a neuron state, based at least in part on the input spike signal x and a state parameter q; obtain a realization of the trace S, based at least in part in integrating the time derivative representation; and determine adjustment dw of the parameter w, based at least in part on the trace S; and the adjustment dw may be configured to transition the neuron state towards a target state, the target state associated with the neuron generating the output spike signal y.
- the integrating representation may be effected via symbolic integration operation
- the state parameter q may be configured to characterize time evolution of the neuron state
- the realization of the trace S may comprise an analytic solution of the time derivative representation
- the construct of the time derivative representation enables to attain the integration via symbolic integration operation.
- the state parameter q may be configured to characterize time evolution of the neuron state in accordance with a state evolution process characterized by: a response mode and a transition mode, the response mode may be associated with generating a neuronal response P; state transition term V describing changes of neuronal state in the transition mode; state transition term R describing changes of state set in the response mode; and state transition term G describing changes of state set due to the input x; the state parameter q may be configured to characterize neuron membrane voltage; and the input may comprise analog signal and the state transition term G may be configured to describe changes of the voltage due to the analog signal.
- the state parameter q may comprise neuron excitability and, the time derivative representation may comprise a sum of V, R, G each multiplicatively combined with the trace S, state transition term V may comprise the trace S multiplicatively combined with a Jacobian matrix Jv configured in accordance with the transition mode of the evolution process; state transition term R may comprise the trace S multiplicatively combined with a Jacobian matrix Jr configured in accordance with the response mode of the evolution process; and state transition term G may comprise the trace S multiplicatively combined with a Jacobian matrix Jg configured in accordance with the input x.
- the input may comprise feed-forward input via an interface; and the learning parameter may comprise efficacy associated with the interface, the interface may comprise synaptic connection and the learning parameter may comprise connection weight, the state parameter q may be configured to describe time evolution of the neuron state in accordance with a state evolution process, characterized by evolution process may be characterized by an instantaneous probability density distribution IPD of generating neuronal response P.
- the instructions are further configured to, when executed, determine derivative d ⁇ /dw of the IPD, with respect to the learning parameter w, based at least in part on the trace S; and obtain an instantaneous score function value g, based at least in part on the derivative d ⁇ /dw; and the determine the adjustment dw may be based at least in part on the instantaneous score function value g, where the determination of the realization of the trace S, and the determination of the derivative d ⁇ /dw, and the obtaining of the instantaneous score function value g cooperate to produce the adjustment dw such that a next instance of the neuron state, associated with an adjusted value w 2 , configured based on the current value w 1 and the adjustment dw, may be closer to the target state.
- computerized apparatus may be configured to process input spike train x using hybrid learning rule, the apparatus comprising stochastic learning block configured to produce learning signal based at least in part on the input spike x and training signal r, the hybrid learning rule may be configured to simultaneously effect reinforcement learning rule and unsupervised learning rule.
- the stochastic learning block may be operable according to a stochastic process characterized by a current state and a desired state, the process being described by at least a state variable configured to transition the learning block from current state to the desired state
- the training signal r may comprise a reinforcement spiking indicator associated with current performance relative to desired performance of the apparatus, the current performance corresponding to the current state and the desired performance corresponding to the desired state, the current performance may be effected, at least partly, by transition from a prior state to the current state
- the reinforcement learning may be configured based at least in part on the reinforcement spiking indicator so that it provides: positive reinforcement when a distance measure between the current state and the desired state may be smaller compared to the distance measure between the prior state and the desired state; and negative reinforcement when the distance measure between the current state and the desired state may be greater compared to the distance measure between the prior state and the desired state.
- the training signal r further may comprise desired output spike train yd, current performance may be effected, at least partly, by transition from prior state to the current state; and the reinforcement learning may be configured based at least in part on the reinforcement spiking indicator so that: positive reinforcement when the current performance may be closer to the desired performance, and the reinforcement may be negative when the current performance may be farther from the desired performance.
- the stochastic learning block may be operable according to stochastic process characterized by current state and desired state, the process being described by at least state variable configured to transition the learning block from current state to the desired state;
- the hybrid learning rule may be characterized by a hybrid performance function F comprising a simultaneous combination of reinforcement learning performance function Fre and supervised learning performance function Fsu; and the simultaneous combination may be effectuated by at least in part on a value of the hybrid performance function F determined at a time step t, the value comprising reinforcement performance function Fre value and supervised learning performance function Fsu value.
- FIG. 1 is a block diagram illustrating a typical architecture of an adaptive system according to prior art.
- FIG. 1A is a block diagram illustrating multi-task learning controller apparatus according to prior art.
- FIG. 2 is a graphical illustration of typical input and output spike trains according to prior art.
- FIG. 3 is a block diagram illustrating generalized learning apparatus, in accordance with one or more implementations.
- FIG. 4 is a block diagram illustrating learning block apparatus of FIG. 3 , in accordance with one or more implementations.
- FIG. 4A is a block diagram illustrating exemplary implementations of performance determination block of the learning block apparatus of FIG. 4 , in accordance with the disclosure.
- FIG. 5 is a block diagram illustrating generalized learning apparatus, in accordance with one or more implementations.
- FIG. 5A is a block diagram illustrating generalized learning block configured for implementing different learning rules, in accordance with one or more implementations.
- FIG. 5B is a block diagram illustrating generalized learning block configured for implementing different learning rules, in accordance with one or more implementations.
- FIG. 5C is a block diagram illustrating a generalized learning block configured for implementing different learning rules, in accordance with one or more implementations.
- FIG. 6A is a block diagram illustrating a spiking neural network, comprising three dynamically configured partitions, configured to effectuate generalized learning block of FIG. 4 , in accordance with one or more implementations.
- FIG. 6B is a block diagram illustrating a spiking neural network, comprising two dynamically configured partitions, adapted to effectuate generalized learning, in accordance with one or more implementations.
- FIG. 7 is a block diagram illustrating spiking neural network configured to effectuate multiple learning rules, in accordance with one or more implementations.
- FIG. 8A is a logical flow diagram illustrating generalized learning method for use with the apparatus of FIG. 5A , in accordance with one or more implementations.
- FIG. 8B is a logical flow diagram illustrating dynamic reconfiguration method for use with the apparatus of FIG. 5A , in accordance with one or more implementations.
- FIG. 9A is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 prior to learning, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network before learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 9B is a plot presenting simulations data illustrating supervised learning operation of the neural network of FIG. 7 , in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network before learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 9C is a plot presenting simulations data illustrating reinforcement learning operation of the neural network of FIG. 7 , in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 9D is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising reinforcement learning aided with small portion of supervisor spikes, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 9E is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising an equal mix of reinforcement and supervised learning signals, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 9F is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising supervised learning augmented with a 50% fraction of reinforcement spikes, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 10A is a plot presenting simulations data illustrating supervised learning operation of the neural network of FIG. 7 , in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network before learning; (iii) supervisor spike pattern.
- FIG. 10B is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising supervised learning augmented by a small amount of unsupervised learning, modeled as 15% fraction of randomly distributed (Poisson) spikes, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning, (iii) supervisor spike pattern.
- data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning, (iii) supervisor spike pattern.
- FIG. 10C is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising supervised learning augmented by a substantial amount of unsupervised learning, modeled as 80% fraction of Poisson spikes, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning, (iii) supervisor spike pattern.
- FIG. 11 is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising supervised learning and reinforcement learning, augmented by a small amount of unsupervised learning, modeled as 15% fraction of Poisson spikes, in accordance with one or more implementations, where data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning, (iii) supervisor spike pattern; (iv) positive reinforcement spike pattern; and (v) negative reinforcement spike pattern.
- FIG. 12 is a plot presenting simulations data illustrating supervised learning operation of the spiking neural network of FIG. 7 .
- Data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern.
- FIG. 13 is a plot presenting simulations data illustrating predictive supervised learning operation of the spiking neural network of FIG. 7 .
- Data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern.
- FIG. 14 is a plot presenting simulations data illustrating reciprocal supervised learning operation of the spiking neural network of FIG. 7 .
- Data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) supervisor spike pattern.
- FIG. 15A is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising unsupervised learning.
- Data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network after learning; (iii) evolution of weights during learning.
- FIG. 15B is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising unsupervised learning via Kullback-Liebler divergence minimization. Data in the top panels represents the average performance, while data in the bottom panel shows evolution of weights during learning.
- FIG. 15C is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising unsupervised learning via Kullback-Liebler divergence minimization. Data in the top panels represents the input spike pattern; while data in the bottom panel shows network output after learning.
- FIG. 16 is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising reinforcement learning.
- Data in the top panel illustrate mean distance ⁇ d> between the actual position of an AUV y(t) and the desired position of the AUV y d (t); while data in the bottom panel present variance of the distance d.
- FIG. 17A is a plot presenting simulations data illustrating operation of the neural network of FIG. 7 , comprising reinforcement learning.
- Data in the panels from top to bottom comprise: (i) input spike pattern; (ii) output activity of the network at time interval corresponding to the epoch 250 of FIG. 17B , (iii) reward spike pattern.
- FIG. 17B is a plot presenting simulations data illustrating averaged performance (top) and evolution of weights (bottom) of the spiking neural network of FIG. 7 , comprising reinforcement learning configured in accordance with one or more implementations.
- FIG. 18 is a logical flow diagram illustrating automatic computation of eligibility traces in a spiking neural network, in accordance with one or more implementations.
- FIGS. 19A-19E are program listings illustrating textual description of spiking neuron dynamics and stochastic properties configured for processing by Matlab® symbolic computation engine in order to automatically generate score function, in accordance with one or more implementations.
- bus is meant generally to denote all types of interconnection or communication architecture that is used to access the synaptic and neuron memory.
- the “bus” may be optical, wireless, infrared, and/or another type of communication medium.
- the exact topology of the bus could be for example standard “bus”, hierarchical bus, network-on-chip, address-event-representation (AER) connection, and/or other type of communication topology used for accessing, e.g., different memories in pulse-based system.
- the terms “computer”, “computing device”, and “computerized device” may include one or more of personal computers (PCs) and/or minicomputers (e.g., desktop, laptop, and/or other PCs), mainframe computers, workstations, servers, personal digital assistants (PDAs), handheld computers, embedded computers, programmable logic devices, personal communicators, tablet computers, portable navigation aids, J2ME equipped devices, cellular telephones, smart phones, personal integrated communication and/or entertainment devices, and/or any other device capable of executing a set of instructions and processing an incoming data signal.
- PCs personal computers
- minicomputers e.g., desktop, laptop, and/or other PCs
- mainframe computers workstations
- servers personal digital assistants
- handheld computers handheld computers
- embedded computers embedded computers
- programmable logic devices personal communicators
- tablet computers tablet computers
- portable navigation aids J2ME equipped devices
- J2ME equipped devices J2ME equipped devices
- cellular telephones cellular telephones
- smart phones personal integrated communication and
- ⁇ may include any sequence of human and/or machine cognizable steps which perform a function.
- Such program may be rendered in a programming language and/or environment including one or more of C/C++, C#, Fortran, COBOL, MATLABTM, PASCAL, Python, assembly language, markup languages (e.g., HTML, SGML, XML, VoXML), object-oriented environments (e.g., Common Object Request Broker Architecture (CORBA)), JavaTM (e.g., J2ME, Java Beans), Binary Runtime Environment (e.g., BREW), and/or other programming languages and/or environments.
- CORBA Common Object Request Broker Architecture
- JavaTM e.g., J2ME, Java Beans
- Binary Runtime Environment e.g., BREW
- connection may include a causal link between any two or more entities (whether physical or logical/virtual), which may enable information exchange between the entities.
- memory may include an integrated circuit and/or other storage device adapted for storing digital data.
- memory may include one or more of ROM, PROM, EEPROM, DRAM, Mobile DRAM, SDRAM, DDR/2 SDRAM, EDO/FPMS, RLDRAM, SRAM, “flash” memory (e.g., NAND/NOR), memristor memory, PSRAM, and/or other types of memory.
- integrated circuit As used herein, the terms “integrated circuit”, “chip”, and “IC” are meant to refer to an electronic circuit manufactured by the patterned diffusion of trace elements into the surface of a thin substrate of semiconductor material.
- integrated circuits may include field programmable gate arrays (e.g., FPGAs), a programmable logic device (PLD), reconfigurable computer fabrics (RCFs), application-specific integrated circuits (ASICs), and/or other types of integrated circuits.
- FPGAs field programmable gate arrays
- PLD programmable logic device
- RCFs reconfigurable computer fabrics
- ASICs application-specific integrated circuits
- microprocessor and “digital processor” are meant generally to include digital processing devices.
- digital processing devices may include one or more of digital signal processors (DSPs), reduced instruction set computers (RISC), general-purpose (CISC) processors, microprocessors, gate arrays (e.g., field programmable gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs), array processors, secure microprocessors, application-specific integrated circuits (ASICs), and/or other digital processing devices.
- DSPs digital signal processors
- RISC reduced instruction set computers
- CISC general-purpose
- microprocessors gate arrays (e.g., field programmable gate arrays (FPGAs)), PLDs, reconfigurable computer fabrics (RCFs), array processors, secure microprocessors, application-specific integrated circuits (ASICs), and/or other digital processing devices.
- FPGAs field programmable gate arrays
- RCFs reconfigurable computer fabrics
- ASICs application-specific integrated
- a network interface refers to any signal, data, and/or software interface with a component, network, and/or process.
- a network interface may include one or more of FireWire (e.g., FW400, FW800, etc.), USB (e.g., USB2), Ethernet (e.g., 10/100, 10/100/1000 (Gigabit Ethernet), 10-Gig-E, etc.), MoCA, Coaxsys (e.g., TVnetTM), radio frequency tuner (e.g., in-band or OOB, cable modem, etc.), Wi-Fi (802.11), WiMAX (802.16), PAN (e.g., 802.15), cellular (e.g., 3G, LTE/LTE-A/TD-LTE, GSM, etc.), IrDA families, and/or other network interfaces.
- FireWire e.g., FW400, FW800, etc.
- USB e.g., USB2
- Ethernet e.g.,
- neurode As used herein, the terms “node”, “neuron”, and “neuronal node” are meant to refer, without limitation, to a network unit (e.g., a spiking neuron and a set of synapses configured to provide input signals to the neuron) having parameters that are subject to adaptation in accordance with a model.
- a network unit e.g., a spiking neuron and a set of synapses configured to provide input signals to the neuron having parameters that are subject to adaptation in accordance with a model.
- state and “node state” is meant generally to denote a full (or partial) set of dynamic variables used to describe node state.
- connection As used herein, the term “synaptic channel”, “connection”, “link”, “transmission channel”, “delay line”, and “communications channel” include a link between any two or more entities (whether physical (wired or wireless), or logical/virtual) which enables information exchange between the entities, and may be characterized by a one or more variables affecting the information exchange.
- Wi-Fi includes one or more of IEEE-Std. 802.11, variants of IEEE-Std. 802.11, standards related to IEEE-Std. 802.11 (e.g., 802.11a/b/g/n/s/v), and/or other wireless standards.
- wireless means any wireless signal, data, communication, and/or other wireless interface.
- a wireless interface may include one or more of Wi-Fi, Bluetooth, 3G (3GPP/3GPP2), HSDPA/HSUPA, TDMA, CDMA (e.g., IS-95A, WCDMA, etc.), FHSS, DSSS, GSM, PAN/802.15, WiMAX (802.16), 802.20, narrowband/FDMA, OFDM, PCS/DCS, LTE/LTE-A/TD-LTE, analog cellular, CDPD, satellite systems, millimeter wave or microwave systems, acoustic, infrared (i.e., IrDA), and/or other wireless interfaces.
- adaptive spiking neuron network signal processing system may flexibly combine different learning rules (e.g., supervised, unsupervised, reinforcement learning, and/or other learning rules) with different methods (e.g., online, batch, and/or other learning methods).
- the generalized learning apparatus of the disclosure may employ, in some implementations, modular architecture where learning tasks are separated from control tasks, so that changes in one of the blocks do not necessitate changes within the other block. By separating implementation of learning tasks from the control tasks, the framework may further allow dynamic reconfiguration of the learning block in response to a task change or learning method change in real time.
- the generalized learning apparatus may be capable of implementing several learning rules concurrently based on the desired control task and without requiring users to explicitly identify the required learning rule composition for that application.
- the generalized learning framework described herein advantageously provides for learning implementations that do not affect regular operation of the signal system (e.g., processing of data). Hence, a need for a separate learning stage may be obviated so that learning may be turned off and on again when appropriate.
- One or more generalized learning methodologies described herein may enable different parts of the same network to implement different adaptive tasks.
- the end user of the adaptive device may be enabled to partition network into different parts, connect these parts appropriately, and assign cost functions to each task (e.g., selecting them from predefined set of rules or implementing a custom rule).
- cost functions e.g., selecting them from predefined set of rules or implementing a custom rule.
- a user may not be required to understand detailed implementation of the adaptive system (e.g., plasticity rules, neuronal dynamics, etc.) nor is he required to be able to derive the performance function and determine its gradient for each learning task. Instead, a user may be able to operate generalized learning apparatus of the disclosure by assigning task functions and connectivity map to each partition.
- Implementations of the disclosure may be, for example, deployed in a hardware and/or software implementation of a neuromorphic computer system.
- a robotic system may include a processor embodied in an application specific integrated circuit, which can be adapted or configured for use in an embedded application (e.g., a prosthetic device).
- FIG. 3 illustrates one exemplary learning apparatus useful to the disclosure.
- the apparatus 300 shown in FIG. 3 comprises the control block 310 , which may include a spiking neural network configured to control a robotic arm and may be parameterized by the weights of connections between artificial neurons, and learning block 320 , which may implement learning and/or calculating the changes in the connection weights.
- the control block 310 may receive an input signal x, and may generate an output signal y.
- the output signal y may include motor control commands configured to move a robotic arm along a desired trajectory.
- the control block 310 may be characterized by a system model comprising system internal state variables S.
- An internal state variable S may include a membrane voltage of the neuron, conductance of the membrane, and/or other variables.
- the control block 310 may be characterized by learning parameters w, which may include synaptic weights of the connections, firing threshold, resting potential of the neuron, and/or other parameters.
- learning parameters w may include synaptic weights of the connections, firing threshold, resting potential of the neuron, and/or other parameters.
- the parameters w may comprise probabilities of signal transmission between the units (e.g., neurons) of the network.
- the input signal x(t) may comprise data used for solving a particular control task.
- the signal x(t) may comprise a stream of raw sensor data (e.g., proximity, inertial, terrain imaging, and/or other raw sensor data) and/or preprocessed data (e.g., velocity, extracted from accelerometers, distance to obstacle, positions, and/or other preprocessed data).
- raw sensor data e.g., proximity, inertial, terrain imaging, and/or other raw sensor data
- preprocessed data e.g., velocity, extracted from accelerometers, distance to obstacle, positions, and/or other preprocessed data
- the signal x(t) may comprise an array of pixel values (e.g., RGB, CMYK, HSV, HSL, grayscale, and/or other pixel values) in the input image, or preprocessed data (e.g., levels of activations of Gabor filters for face recognition, contours, and/or other preprocessed data).
- preprocessed data e.g., levels of activations of Gabor filters for face recognition, contours, and/or other preprocessed data.
- the input signal x(t) may comprise desired motion trajectory, for example, in order to predict future state of the robot on the basis of current state and desired motion.
- the parameter w may denote various system parameters including connection efficacy, firing threshold, resting potential of the neuron, and/or other parameters.
- the analytical relationship of Eqn. 1 may be selected such that the gradient of ln [p(y
- the framework shown in FIG. 3 may be configured to estimate rules for changing the system parameters (e.g., learning rules) so that the performance function F(x,y,r) is minimized for the current set of inputs and outputs and system dynamics S.
- control performance function may be configured to reflect the properties of inputs and outputs (x,y).
- the values F(x,y,r) may be calculated directly by the learning block 320 without relying on external signal r when providing solution of unsupervised learning tasks.
- the value of the function F may be calculated based on a difference between the output y of the control block 310 and a reference signal y d characterizing the desired control block output. This configuration may provide solutions for supervised learning tasks, as described in detail below.
- the value of the performance function F may be determined based on the external signal r. This configuration may provide solutions for reinforcement learning tasks, where r represents reward and punishment signals from the environment.
- the learning block 320 may implement learning framework according to the implementation of FIG. 3 that enables generalized learning methods without relying on calculations of the performance function F derivative in order to solve unsupervised, supervised, reinforcement, and/or other learning tasks.
- the block 320 may receive the input x and output y signals (denoted by the arrow 302 _ 1 , 308 _ 1 , respectively, in FIG. 3 ), as well as the state information 305 .
- external teaching signal r may be provided to the block 320 as indicated by the arrow 304 in FIG. 3 .
- the teaching signal may comprise, in some implementations, the desired motion trajectory, and/or reward and punishment signals from the external environment.
- the learning block 320 may optimize performance of the control system (e.g., the system 300 of FIG. 3 ) that is characterized by minimization of the average value of the performance function F(x,y,r) as described in detail in co-owned and co-pending U.S. patent application Ser. No. 13/487,499 entitled “STOCHASTIC APPARATUS AND METHODS FOR IMPLEMENTING GENERALIZED LEARNING RULES”, incorporated supra.
- the above-referenced application describes, in one or more implementations, minimizing the average performance F x,y,r using, for example, gradient descend algorithms where
- ⁇ 13 is the per-stimulus entropy of the system response (or ‘surprisal’).
- x, y) may be characteristic of the external environment and may not change due to adaptation. That property may allow omission of averaging over external signals r in subsequent consideration of learning rules.
- the learning block may have access to the system's inputs and outputs, and/or system internal state S.
- the learning block may be provided with additional inputs 304 (e.g., reinforcement signals, desired output, and/or current costs of control movements, etc.) that are related to the current task of the control block.
- the learning block may estimate changes of the system parameters w that minimize the performance function F, and may provide the parameter adjustment information ⁇ w to the control block 310 , as indicated by the arrow 306 in FIG. 3 .
- the learning block may be configured to modify the learning parameters w of the controller block.
- the learning block may be configured to communicate parameters w (as depicted by the arrow 306 in FIG. 3 ) for further use by the controller block 310 , or to another entity (not shown).
- the architecture shown in FIG. 3 may provide flexibility of applying different (or modifying) learning algorithms without requiring modifications in the control block model.
- the methodology illustrated in FIG. 3 may enable implementation of the learning process in such a way that regular functionality of the control aspects of the system 300 is not affected. For example, learning may be turned off and on again as required with the control block functionality being unaffected.
- the detailed structure of the learning block 420 is shown and described with respect to FIG. 4 .
- the learning block 420 may comprise one or more of gradient determination (GD) block 422 , performance determination (PD) block 424 and parameter adaptation block (PA) 426 , and/or other components.
- the implementation shown in FIG. 4 may decompose the learning process of the block 420 into two parts.
- a task-dependent/system independent part i.e., the block 420
- Implementation of the PD block 424 may not depend on particulars of the control block (e.g., block 310 in FIG.
- the second part of the learning block 420 may implement task-independent/system dependent aspects of the learning block operation.
- the implementation of the GD block 422 and PA block 426 may be the same for individual learning rules (e.g., supervised and/or unsupervised).
- the GD block implementation may further comprises particulars of gradient determination and parameter adaptation that are specific to the controller system 310 architecture (e.g., neural network composition, neuron operating dynamics, and/or plasticity rules).
- the architecture shown in FIG. 4 may allow users to modify task-specific and/or system-specific portions independently from one another, thereby enabling flexible control of the system performance.
- An advantage of the framework may be that the learning can be implemented in a way that does not affect the normal protocol of the functioning of the system (except of changing the parameters w). For example, there may be no need in a separate learning stage and learning may be turned off and on again when appropriate.
- the GD block may be configured to determine the score function g by, inter alia, computing derivatives of the logarithm of the conditional probability with respect to the parameters that are subjected to change during learning based on the current inputs x, outputs y, and state variables S, denoted by the arrows 402 , 408 , 410 , respectively, in FIG. 4 .
- the GD block may produce an estimate of the score function g, denoted by the arrow 418 in FIG. 4 that is independent of the particular learning task, (e.g., reinforcement, unsupervised, and/or supervised earning).
- the score function g may be represented as a vector g, comprising scores g i associated with individual parameter components w i .
- Implementation of this block may be non-trivial for the complex adaptive systems, such as spiking neural networks. However, using the framework described herein, this implementation may need to be changed only once and then used without changing for different learning tasks, as described in detail below.
- g i ⁇ ⁇ h ( y ⁇ ⁇ x ) ⁇ w i may be calculated for individual spiking neurons parameters to be changed. If spiking patterns are viewed on finite interval length T as an input x and output y of the neuron, then the score function may take the following form:
- an instantaneous value of the score function may be calculated that is a time derivative of the interval score function:
- the score function for spiking pattern on interval T may be calculated as:
- ⁇ ⁇ ⁇ ( t ) ⁇ w i may be calculated, which is a derivative of the instantaneous probability density with respect to a learning parameter w i of the i-th neuron.
- input weights learning synaptic plasticity
- stochastic threshold tuning intrinsic plasticity
- the neuron may receive n input spiking channels.
- External current to the neuron I ext in the neuron's dynamic equation may be modeled Eqn. 6 as a sum of filtered and weighted input spikes from all input channels:
- I ext ⁇ i n ⁇ ⁇ t j i ⁇ x i ⁇ w i ⁇ ⁇ ⁇ ( t - t j i ) ( Eqn . ⁇ 18 )
- i is the index of the input channel
- x i is the stream of input spikes on the i-th channel
- t i j is the times of input spikes in the i-th channel
- w i is the weight of the i-th channel
- ⁇ (t) is a generic function that models post-synaptic currents from input spikes.
- the post-synaptic current function may be configured as: ⁇ (t) ⁇ (t), ⁇ (t) ⁇ e ⁇ t/t s H(t), where ⁇ (t) is a delta function, H(t) is a Heaviside function, and ⁇ s is a synaptic time constant.
- a derivative of instantaneous probability density with respect to the i-th channel's weight may be taken using chain rule:
- ⁇ ⁇ ⁇ w i ⁇ j ⁇ ( ⁇ ⁇ i ⁇ q j ⁇ ⁇ w i ⁇ q j ) ⁇ ⁇
- derivative with respect to the learning weight w i may be determined as:
- ⁇ ⁇ w i ⁇ ( d q ⁇ d t ) ⁇ ⁇ w i ⁇ ( V ⁇ ( q ⁇ ) ) + ⁇ ⁇ w i ⁇ ( ⁇ t out ⁇ R ⁇ ( q ⁇ ) ⁇ ⁇ ⁇ ( t - t out ) ) + ⁇ ⁇ w i ⁇ ( G ⁇ ( q ⁇ ) ⁇ I ext ) ( Eqn . ⁇ 21 )
- u w j denotes derivative of the state variable (e.g., voltage) with respect to the i-th weight.
- a solution of Eqn. 24 may represent post-synaptic potential for the i-th unit and may be determined as a sum of all received input spikes at the unit (e.g., a neuron), where the unit is reset to zero after each output spike:
- PSP post-synaptic potential
- the IZ neuronal model may further be characterized using two first-order nonlinear differential equations describing time evolution of synaptic weights associated with each input interface (e.g., pre-synaptic connection) of a neuron, in the following form:
- g i ⁇ h ⁇ ⁇ ⁇ t ⁇ ( y ⁇ ( t )
- x ) ⁇ w i ⁇ ⁇ ( t ) ⁇ ⁇ t j i ⁇ x i ⁇ ⁇ ⁇ ( t - t j i ) ⁇ ( 1 - ⁇ t out ⁇ y ⁇ ⁇ d ⁇ ( t - t out ) ⁇ ⁇ ( t ) ) ⁇ ⁇ ⁇ t ⁇ ( Eqn . ⁇ 32 )
- the gradient determination block may be configured to determine the score function g based on particular inputs into the neuron(s), neuron outputs, and internal neuron state, according, for example with Eqn. 15.
- the methodology described herein and providing description of neurons dynamics and stochastic properties in textual form, as shown and described in detail with respect to FIGS. 19A-19E below advantageously allows the use of analytical mathematics computer aided design (CAD) tools in order to automatically obtain score function, such as for example Eqn. 32.
- CAD computer aided design
- the PD block may be configured to determine the performance function F based on the current inputs x, outputs y, and/or training signal r, denoted by the arrow 404 in FIG. 4 .
- the external signal r may comprise the reinforcement signal in the reinforcement learning task.
- the external signal r may comprise reference signal in the supervised learning task.
- the external signal r comprises the desired output, current costs of control movements, and/or other information related to the current task of the control block (e.g., block 310 in FIG. 3 ).
- the learning apparatus configuration depicted in FIG. 4 may decouple the PD block from the controller state model so that the output of the PD block depends on the learning task and is independent of the current internal state of the control block.
- a mobile robot controlled by spiking neural network, may be configured to collect resources (e.g., clean up trash) while avoiding obstacles (e.g., furniture, walls).
- the signal r may comprise a positive indication (e.g., representing a reward) at the moment when the robot acquires the resource (e.g., picks up a piece of rubbish) and a negative indication (e.g., representing a punishment) when the robot collides with an obstacle (e.g., wall).
- the spiking neural network of the robot controller may change its parameters (e.g., neuron connection weights) in order to maximize the function F (e.g., maximize the reward and minimize the punishment).
- the control apparatus e.g., the apparatus 300 of FIG. 3
- a human expert may present to the network an exemplary sensory pattern x and the desired output y d that describes the input pattern x class.
- the network may change (e.g., adapt) its parameters w to achieve the desired response on the presented pairs of input x and desired response y d .
- the network may classify new input stimuli based on one or more past experiences.
- the distance measure may utilize the mutual information between the output signal and the reference signal.
- ⁇ is a function configured to extract the characteristic (or characteristics) of interest from the output signal y.
- the characteristic may correspond to a firing rate of spikes and the function ⁇ (y) may determine the mean firing from the output.
- the ⁇ d (y) may be calculated internally by the PD block.
- the PD block may determine the performance function by calculating the instantaneous Kullback-Leibler divergence d KL between the output probability distribution p(y
- the performance function of Eqn. 41 may be applied in unsupervised learning tasks in order to restrict a possible output of the system. For example, if ⁇ (y) is a Poisson distribution of spikes with some firing rate R, then minimization of this performance function may force the neuron to have the same firing rate R.
- the PD block may determine the performance function for the sparse coding.
- a learning framework of the present innovation may enable generation of learning rules for a system, which may be configured to solve several completely different tasks-types simultaneously. For example, the system may learn to control an actuator while trying to extract independent components from movement trajectories of this actuator.
- linear cost function combination described by 44 illustrates one particular implementation of the disclosure and other implementations (e.g., a nonlinear combination) may be used as well.
- the PD block may be configured to calculate the baseline of the performance function values (e.g., as a running average) and subtract it from the instantaneous value of the performance function in order to increase learning speed of learning.
- the time average of the performance function may comprise an interval average, where learning occurs over a predetermined interval.
- a current value of the cost function may be determined at individual steps within the interval and may be averaged over all steps.
- the time average of the performance function may comprise a running average, where the current value of the cost function may be low-pass filtered according to:
- the PD block implementation denoted 434 may be configured to simultaneously implement reinforcement, supervised and unsupervised (RSU) learning rules; and/or receive the input signal x(t) 412 , the output signal y(t) 418 , and/or the learning signal 436 .
- the learning signal 436 may comprise the reinforcement component r(t) and the desired output (teaching) component y d (t).
- the output performance function F_RSU 438 of the RSUPD block may be determined in accordance with Eqn. 69 described below.
- the PD blocks 444 , 445 may implement the reinforcement (R) learning rule.
- the output 448 of the block 444 may be determined based on the output signal y(t) 418 and the reinforcement signal r(t) 446 .
- the output 448 of the RSUPD block may be determined in accordance with Eqn. 38.
- the performance function output 449 of the block 445 may be determined based on the input signal x(t), the output signal y(t), and/or the reinforcement signal r(t).
- the PD block implementation denoted 454 may be configured to implement supervised (S) learning rules to generate performance function F_S 458 that is dependent on the output signal y(t) value 418 and the teaching signal y d (t) 456 .
- the output 458 of the PD 454 block may be determined in accordance with Eqn. 34-Eqn. 37.
- the output performance function 468 of the PD block 464 implementing unsupervised learning may be a function of the input x(t) 412 and the output y(t) 418 .
- the output 468 may be determined in accordance with Eqn. 39-Eqn. 42.
- the PD block implementation denoted 474 may be configured to simultaneously implement reinforcement and supervised (RS) learning rules.
- the PD block 474 may not require the input signal x(t), and may receive the output signal y(t) 418 and the teaching signals r(t), y d (t) 476 .
- the output performance function F_RS 478 of the PD block 474 may be determined in accordance with Eqn. 43, where the combination coefficient for the unsupervised learning is set to zero.
- reinforcement learning task may be to acquire resources by the mobile robot, where the reinforcement component r(t) provides information about acquired resources (reward signal) from the external environment, while at the same time a human expert shows the robot what should be desired output signal y d (t) to optimally avoid obstacles.
- the robot may be trained to try to acquire the resources if it does not contradict with human expert signal for avoiding obstacles.
- the PD block implementation denoted 475 may be configured to simultaneously implement reinforcement and supervised (RS) learning rules.
- the PD block 475 output may be determined based the output signal 418 , the learning signals 476 , comprising the reinforcement component r(t) and the desired output (teaching) component y d (t) and on the input signal 412 , that determines the context for switching between supervised and reinforcement task functions.
- reinforcement learning task may be used to acquire resources by the mobile robot, where the reinforcement component r(t) provides information about acquired resources (reward signal) from the external environment, while at the same time a human expert shows the robot what should be desired output signal y d (t) to optimally avoid obstacles.
- the performance signal may be switched between supervised and reinforcement. That may allow the robot to be trained to try to acquire the resources if it does not contradict with human expert signal for avoiding obstacles.
- the output performance function 479 of the PD 475 block may be determined in accordance with Eqn. 43, where the combination coefficient for the unsupervised learning is set to zero.
- the PD block implementation denoted 484 may be configured to simultaneously implement reinforcement, and unsupervised (RU) learning rules.
- the output 488 of the block 484 may be determined based on the input and output signals 412 , 418 , in one or more implementations, in accordance with Eqn. 43.
- the task of the adaptive system on the robot may be not only to extract sparse hidden components from the input signal, but to pay more attention to the components that are behaviorally important for the robot (that provides more reinforcement after they can be used).
- the PD block implementation denoted 494 which may be configured to simultaneously implement supervised and unsupervised (SU) learning rules, may receive the input signal x(t) 412 , the output signal y(t) 418 , and/or the teaching signal y d (t) 436 .
- the output performance function F_SU 438 of the SU PD block may be determined in accordance with Eqn. 68 described below.
- the stochastic learning system (that is associated with the PD block implementation 494) may be configured to learn to implement unsupervised data categorization (e.g., using sparse coding performance function), while simultaneously receiving external signal that is related to the correct category of particular input signals.
- unsupervised data categorization e.g., using sparse coding performance function
- external signal e.g., using sparse coding performance function
- reward signal may be provided by a human expert.
- the PD block may generate the performance signal based on analog and/or spiking reward signal r (e.g., the signal 404 of FIG. 4 ).
- the performance signal F e.g., the signal 428 of FIG. 4
- the PA block e.g., the block 426 of FIG. 4
- the PD block in order to reduce computational load on the PA block related to application of weight changes, may transform the analog reward r(t) into spike form.
- the current performance F may be determined based on the output of the neuron and the external reference signal (e.g., the desired output y d (t)).
- a distance measure may be calculated using a low-pass filtered version of the desired y d (t) and actual y(t) outputs.
- a running distance between the filtered spike trains may be determined according to:
- ⁇ 47 with y(t) and y d (t) being the actual and desired output spike trains; ⁇ (t) is the Dirac delta function; t i out , t j d are the output and desired spike times, respectively; and a(t), b(t) are positive finite-response kernels.
- the D KL learning may enable stabilization of the neuronal firing rate.
- part of the performance optimization may comprise maximization of the mutual information between the actual output y(t) and some reference signal r(t).
- the parameter changing PA block (the block 426 in FIG. 4 ) may determine changes of the control block parameters ⁇ w i according to a predetermined learning algorithm, based on the performance function F and the gradient g it receives from the PD block 424 and the GD block 422 , as indicated by the arrows marked 428 , 430 , respectively, in FIG. 4 .
- Particular implementation of the learning algorithm within the block 426 may depend on the type of the learning task (e.g., online or batch learning) used by the learning block 320 of FIG. 3 .
- the learning method implementation according to (Eqn. 51) may be advantageous in applications where the performance function F(t) depends on the current values of the inputs x, outputs y, and/or signal r.
- control parameter adjustment ⁇ w may be determined using an accumulation of the score function gradient and the performance function values, and applying the changes at a predetermined time instance (corresponding to, e.g., the end of the learning epoch):
- T is a finite interval over which the summation occurs
- N is the number of steps
- ⁇ t is the time step determined as T/N.
- the summation interval T in Eqn. 52 may be configured based on the specific requirements of the control application.
- the interval may correspond to a time from the start position of the arm to the reaching point and, in some implementations, may be about 1 s-50 s.
- the time interval T may match the time required to pronounce the word being recognized (typically less than 1 s-2 s).
- the method of Eqn. 52 may be computationally expensive and may not provide timely updates. Hence, it may be referred to as the non-local in time due to the summation over the interval T. However, it may lead to unbiased estimation of the gradient of the performance function.
- control parameter adjustment ⁇ w i may be determined by calculating the traces of the score function e i (t) for individual parameters w i .
- the traces may be determined using differential equations:
- the method of Eqn. 53-Eqn. 55 may be appropriate when a performance function depends on current and past values of the inputs and outputs and may be referred to as the OLPOMDP algorithm.
- Eqn. 53-Eqn. 55 may be used, in some implementations, in a rescue robotic device configured to locate resources (e.g., survivors, or unexploded ordinance) in a building.
- the input x may correspond to the robot current position in the building.
- the reward r e.g., the successful location events
- the agent may depend on the history of inputs and on the history of actions taken by the agent (e.g., left/right turns, up/down movement, etc.).
- control parameter adjustment ⁇ w determined using methodologies of the Eqns. 16, 17, 19 may be further modified using, in one variant, gradient with momentum according to: ⁇ ( t ) ⁇ ( t ⁇ t )+ ⁇ ( t ), (Eqn. 56) where ⁇ is the momentum coefficient.
- the sign of gradient may be used to perform learning adjustments as follows:
- gradient descend methodology may be used for learning coefficient adaptation.
- the gradient signal g determined by the PD block 422 of FIG. 4 , may be subsequently modified according to another gradient algorithm, as described in detail below.
- these modifications may comprise determining natural gradient, as follows:
- the generalized learning framework described supra may enable implementing signal processing blocks with tunable parameters w.
- Using the learning block framework that provides analytical description of individual types of signal processing block may enable it to automatically calculate the appropriate score function
- a generalized implementation of the learning block may enable automatic changes of learning parameters w by individual blocks based on high level information about the subtask for each block.
- a signal processing system comprising one or more of such generalized learning blocks may be capable of solving different learning tasks useful in a variety of applications without substantial intervention of the user.
- such generalized learning blocks may be configured to implement generalized learning framework described above with respect to FIGS. 3-4A and delivered to users.
- the user may connect different blocks, and/or specify a performance function and/or a learning algorithm for individual blocks.
- GUI graphical user interface
- FIG. 5 illustrates one exemplary implementation of a robotic apparatus 500 comprising adaptive controller apparatus 512 .
- the adaptive controller 520 may be configured similar to the apparatus 300 of FIG. 3 and may comprise generalized learning block (e.g., the block 420 ), configured, for example according to the framework described above with respect to FIG. 4 , supra, is shown and described.
- the robotic apparatus 500 may comprise the plant 514 , corresponding, for example, to a sensor block and a motor block (not shown).
- the plant 514 may provide sensory input 502 , which may include a stream of raw sensor data (e.g., proximity, inertial, terrain imaging, and/or other raw sensor data) and/or preprocessed data (e.g., velocity, extracted from accelerometers, distance to obstacle, positions, and/or other preprocessed data) to the controller apparatus 520 .
- the learning block of the controller 520 may be configured to implement reinforcement learning, according to, in some implementations Eqn. 38, based on the sensor input 502 and reinforcement signal 504 (e.g., obstacle collision signal from robot bumpers, distance from robotic arm endpoint to the desired position), and may provide motor commands 506 to the plant.
- the learning block of the adaptive controller apparatus e.g., the apparatus 520 of FIG.
- the reinforcement signal r(t) may inform the adaptive controller that the previous behavior led to “desired” or “undesired” results, corresponding to positive and negative reinforcements, respectively. While the plant 514 must be controllable (e.g., via the motor commands in FIG. 5 ) and the control system may be required to have access to appropriate sensory information (e.g., the data 502 in FIG. 5 ), detailed knowledge of motor actuator dynamics or of structure and significance of sensory signals may not be required to be known by the controller apparatus 520 .
- learning parameter e.g., weight
- the adaptive controller 520 of FIG. 5 may be configured for: (i) unsupervised learning for performing target recognition, as illustrated by the adaptive controller 520 _ 3 of FIG. 5A , receiving sensory input and output signals (x,y) 522 _ 3 ; (ii) supervised learning for performing data regression, as illustrated by the adaptive controller 520 _ 3 receiving output signal 522 _ 1 and teaching signal 504 _ 1 of FIG. 5A ; and/or (iii) simultaneous supervised and unsupervised learning for performing platform stabilization, as illustrated by the adaptive controller 520 _ 2 of FIG. 5A , receiving input 522 _ 2 and learning 504 _ 2 signals.
- FIGS. 5B-5C illustrate dynamic tasking by a user of the adaptive controller apparatus (e.g., the apparatus 320 of FIG. 3A or 520 of FIG. 5 , described supra) in accordance with one or more implementations.
- the adaptive controller apparatus e.g., the apparatus 320 of FIG. 3A or 520 of FIG. 5 , described supra
- a user of the adaptive controller 520 _ 4 of FIG. 5B may utilize a user interface (textual, graphics, touch screen, etc.) in order to configure the task composition of the adaptive controller 520 _ 4 , as illustrated by the example of FIG. 5B .
- the adaptive controller 520 _ 4 of FIG. 5B may be configured to perform the following tasks: (i) task 550 _ 1 comprising sensory compressing via unsupervised learning; (ii) task 550 _ 2 comprising reward signal prediction by a critic block via supervised learning; and (ii) task 550 _ 3 comprising implementation of optimal action by an actor block via reinforcement learning.
- the user may specify that task 550 _ 1 may receive external input ⁇ X ⁇ 542 , comprising, for example raw audio or video stream, output 546 of the task 550 _ 1 may be routed to each of tasks 550 _ 2 , 550 _ 3 , output 547 of the task 550 _ 2 may be routed to the task 550 _ 3 ; and the external signal ⁇ r ⁇ ( 544 ) may be provided to each of tasks 550 _ 2 , 550 _ 3 , via pathways 544 _ 1 , 544 _ 2 , respectively as illustrated in FIG. 5B .
- FIG. 5B In the implementation illustrated in FIG.
- performance function F u of the task 550 _ 1 may be determined based on (i) ‘sparse coding’; and/or (ii) maximization of information.
- Performance function F s of the task 550 _ 2 may be determined based on minimizing distance between the actual output 547 (prediction pr) d(r, pr) and the external reward signal r 544 _ 1 .
- the end user may select performance functions from a predefined set and/or the user may implement a custom task.
- the controller 520 _ 4 may be configured to perform a different set of task: (i) the task 550 _ 1 , described above with respect to FIG. 5B ; and task 552 _ 4 , comprising pattern classification via supervised learning. As shown in FIG. 5C , the output of task 550 _ 1 may be provided as the input 566 to the task 550 _ 4 .
- the controller 520 _ 4 of FIG. 5C may automatically configure the respective performance functions, without further user intervention.
- the performance function corresponding to the task 550 _ 4 may be configured to minimize distance between the actual task output 568 (e.g., a class ⁇ Y ⁇ to which a sensory pattern belongs) and human expert supervised signal 564 (the correct class y d ).
- Generalized learning methodology described herein may enable the learning apparatus 520 _ 4 to implement different adaptive tasks, by, for example, executing different instances of the generalized learning method, individual ones configured in accordance with the particular task (e.g., tasks 550 _ 1 , 550 _ 2 , 550 _ 3 , in FIG. 5B , and 550 _ 4 , 550 _ 5 in FIG. 5C ).
- the user of the apparatus may not be required to know implementation details of the adaptive controller (e.g., specific performance function selection, and/or gradient determination). Instead, the user may ‘task’ the system in terms of task functions and connectivity.
- FIGS. 6A-6B illustrate exemplary implementations of reconfigurable partitioned neural network apparatus comprising generalized learning framework, described above.
- the network 600 of FIG. 6A may comprise several partitions 610 , 620 , 630 , comprising one or more of nodes 602 receiving inputs 612 ⁇ X ⁇ via connections 604 , and providing outputs via connections 608 .
- the nodes 602 of the network 600 may comprise spiking neurons (e.g., the neurons 730 of FIG. 9 , described below), the connections 604 , 608 may be configured to carry spiking input into neurons, and spiking output from the neurons, respectively.
- the neurons 602 may be configured to generate responses (as described in, for example, co-owned and co-pending U.S. patent application Ser. No. 13/152,105 filed on Jun. 2, 2011, and entitled “APPARATUS AND METHODS FOR TEMPORALLY PROXIMATE OBJECT RECOGNITION”, incorporated by reference herein in its entirety) which may be propagated via feed-forward connections 608 .
- the network 600 may comprise artificial neurons, such as for example, spiking neurons described by co-owned and co-pending U.S. patent application Ser. No. 13/152,105 filed on Jun. 2, 2011, and entitled “APPARATUS AND METHODS FOR TEMPORALLY PROXIMATE OBJECT RECOGNITION”, incorporated supra, artificial neurons with sigmoidal activation function, binary neurons (perceptron), radial basis function units, and/or fuzzy logic networks.
- artificial neurons such as for example, spiking neurons described by co-owned and co-pending U.S. patent application Ser. No. 13/152,105 filed on Jun. 2, 2011, and entitled “APPARATUS AND METHODS FOR TEMPORALLY PROXIMATE OBJECT RECOGNITION”, incorporated supra, artificial neurons with sigmoidal activation function, binary neurons (perceptron), radial basis function units, and/or fuzzy logic networks.
- partitions of the network 600 may be configured, in some implementations, to perform specialized functionality.
- the partition 610 may adapt raw sensory input of a robotic apparatus to internal format of the network (e.g., convert analog signal representation to spiking) using for example, methodology described in U.S. patent application Ser. No. 13/314,066, filed Dec. 7, 2011, entitled “NEURAL NETWORK APPARATUS AND METHODS FOR SIGNAL CONVERSION”, incorporated herein by reference in its entirety.
- the output ⁇ Y 1 ⁇ of the partition 610 may be forwarded to other partitions, for example, partitions 620 , 630 , as illustrated by the broken line arrows 618 , 618 _ 1 in FIG. 6A .
- the partition 620 may implement visual object recognition learning that may require training input signal y d j (t) 616 , such as for example an object template and/or a class designation (friend/foe).
- the output ⁇ Y 2 ⁇ ) of the partition 620 may be forwarded to another partition (e.g., partition 630 ) as illustrated by the dashed line arrow 628 in FIG. 6A .
- the partition 630 may implement motor control commands required for the robotic arm to reach and grasp the identified object, or motor commands configured to move robot or camera to a new location, which may require reinforcement signal r(t) 614 .
- the partition 630 may generate the output ⁇ Y ⁇ 638 of the network 600 implementing adaptive controller apparatus (e.g., the apparatus 520 of FIG. 5 ).
- the homogeneous configuration of the network 600 illustrated in FIG. 6A , may enable a single network comprising several generalized nodes of the same type to implement different learning tasks (e.g., reinforcement and supervised) simultaneously.
- the input 612 may comprise input from one or more sensor sources (e.g., optical input ⁇ Xopt ⁇ and audio input ⁇ Xaud ⁇ ) with each modality data being routed to the appropriate network partition, for example, to partitions 610 , 630 of FIG. 6A , respectively.
- sensor sources e.g., optical input ⁇ Xopt ⁇ and audio input ⁇ Xaud ⁇
- FIG. 6B illustrates one exemplary implementation of network reconfiguration in accordance with the disclosure.
- the network 640 may comprise partition 650 , which may be configured to perform unsupervised learning task, and partition 660 , which may be configured to implement supervised and reinforcement learning simultaneously.
- the network configuration of FIG. 6B may be used to perform signal separation tasks by the partition 650 and signal classification tasks by the partition 660 .
- the partition 650 may be operated according to unsupervised learning rule and may generate output ⁇ Y 3 ⁇ denoted by the arrow 658 in FIG. 6B .
- the partition 660 may be operated according to a combined reinforcement and supervised rule, may receive supervised and reinforcement input 656 , and/or may generate the output ⁇ Y 4 ⁇ 668 .
- the dynamic network learning reconfiguration illustrated in FIGS. 6A-6B may be used, for example, in an autonomous robotic apparatus performing exploration tasks (e.g., a pipeline inspection autonomous underwater vehicle (AUV), or space rover, explosive detection, and/or mine exploration).
- exploration tasks e.g., a pipeline inspection autonomous underwater vehicle (AUV), or space rover, explosive detection, and/or mine exploration.
- the available network resources i.e., the nodes 602
- Such reuse of network resources may be traded for (i) smaller network processing apparatus, having lower cost, size and consuming less power, as compared to a fixed pre-determined configuration; and/or (ii) increased processing capability for the same network capacity.
- the reconfiguration methodology described supra may comprise a static reconfiguration, where particular node populations are designated in advance for specific partitions (tasks); a dynamic reconfiguration, where node partitions are determined adaptively based on the input information received by the network and network state; and/or a semi-static reconfiguration, where static partitions are assigned predetermined life-span.
- the network 700 may comprise at least one stochastic spiking neuron 730 , operable according to, for example, a Spike Response Model, and configured to receive n-dimensional input spiking stream X(t) 702 via n-input connections 714 .
- the n-dimensional spike stream may correspond to n-input synaptic connections into the neuron.
- individual input connections may be characterized by a connection parameter 712 w ij that is configured to be adjusted during learning.
- the connection parameter may comprise connection efficacy (e.g., weight).
- the parameter 712 may comprise synaptic delay.
- the parameter 712 may comprise probabilities of synaptic transmission.
- the following signal notation may be used in describing operation of the network 700 , below:
- y ⁇ ( t ) ⁇ i ⁇ ⁇ ⁇ ⁇ ( t - t i ) denotes the output spike pattern, corresponding to the output signal 708 produced by the control block 710 of FIG. 3 , where t i denotes the times of the output spikes generated by the neuron;
- y d ⁇ ( t ) ⁇ t i ⁇ ⁇ ⁇ ⁇ ( t - t i d ) denotes the teaching spike pattern, corresponding to the desired (or reference) signal that is part of external signal 404 of FIG. 4 , where t i d denotes the times when the spikes of the reference signal are received by the neuron;
- the neuron 730 may be configured to receive training inputs, comprising the desired output (reference signal) y d (t) via the connection 704 . In some implementations, the neuron 730 may be configured to receive positive and negative reinforcement signals via the connection 704 .
- the neuron 730 may be configured to implement the control block 710 (that performs functionality of the control block 310 of FIG. 3 ) and the learning block 720 (that performs functionality of the control block 320 of FIG. 3 , described supra.)
- the block 710 may be configured to receive input spike trains X(t), as indicated by solid arrows 716 in FIG. 7 , and to generate output spike train y(t) 708 according to a Spike Response Model neuron which voltage v(t) is calculated as:
- a probabilistic part of a neuron may be introduced using the exponential probabilistic threshold.
- State variables g (probability of firing ⁇ (t) for this system) associated with the control model may be provided to the learning block 720 via the pathway 705 .
- the learning block 720 of the neuron 730 may receive the output spike train y(t) via the pathway 708 _ 1 .
- the learning block 720 may receive the input spike train (not shown).
- the learning block 720 may receive the learning signal, indicated by dashed arrow 704 _ 1 in FIG. 7 .
- the learning block determines adjustment of the learning parameters w, in accordance with any methodologies described herein, thereby enabling the neuron 730 to adjust, inter alia, parameters 712 of the connections 714 .
- the method 800 of FIG. 8A may allow the learning apparatus to: (i) implement different learning rules (e.g., supervised, unsupervised, reinforcement, and/or other learning rules); and (ii) simultaneously support more than one rule (e.g., combination of supervised, unsupervised, reinforcement rules described, for example by Eqn. 43) using the same hardware/software configuration.
- different learning rules e.g., supervised, unsupervised, reinforcement, and/or other learning rules
- more than one rule e.g., combination of supervised, unsupervised, reinforcement rules described, for example by Eqn. 43
- the input information may be received.
- the input information may comprise the input signal x(t), which may comprise raw or processed sensory input, input from the user, and/or input from another part of the adaptive system.
- the input information received at step 802 may comprise learning task identifier configured to indicate the learning rule configuration (e.g., Eqn. 43) that should be implemented by the learning block.
- the indicator may comprise a software flag transited using a designated field in the control data packet.
- the indicator may comprise a switch (e.g., effectuated via a software commands, a hardware pin combination, or memory register).
- learning framework of the performance determination block may be configured in accordance with the task indicator.
- the learning structure may comprise, inter alia, performance function configured according to Eqn. 43.
- parameters of the control block e.g., number of neurons in the network, may be configured as well.
- the status of the learning indicator may be checked to determine whether additional learning input is required.
- the additional learning input may comprise reinforcement signal r(t).
- the additional learning input may comprise desired output (teaching signal) y d (t), described above with respect to FIG. 4 .
- the external learning input may be received by the learning block at step 808 .
- the value of the present performance may be computed performance function F(x,y,r) configured at the prior step. It will be appreciated by those skilled in the arts, that when performance function is evaluated for the first time (according, for example to Eqn. 35) and the controller output y(t) is not available, a pre-defined initial value of y(t) (e.g., zero) may be used instead.
- gradient g(t) of the score function may be determined by the GD block (e.g., The block 422 of FIG. 4 ).
- learning parameter w update may be determined by the Parameter Adjustment block (e.g., block 426 of FIG. 4 ) using the performance function F and the gradient g, determined at steps 812 , 814 , respectively.
- the learning parameter update may be implemented according to Eqns. 22-31.
- gradient g(t) of the score function may be determined according, by the GD block (e.g., block 422 of FIG. 4 ).
- the learning parameter update may be subsequently provided to the control block (e.g., block 310 of FIG. 3 ).
- control output y(t) of the controller may be updated using the input signal x(t) (received via the pathway 820 ) and the updated learning parameter ⁇ w.
- FIG. 8B illustrates a method of dynamic controller reconfiguration based on learning tasks, in accordance with one or more implementations.
- the input information may be received.
- the input information may comprise the input signal x(t) and/or learning task identifier configured to indicate the learning rule configuration (e.g., Eqn. 43) that should be implemented buy the learning block.
- the learning rule configuration e.g., Eqn. 43
- the controller partitions may be configured in accordance with the learning rules (e.g., supervised, unsupervised, reinforcement, and/or other learning rules) corresponding to the task received at step 832 .
- individual partitions may be operated according to, for example, the method 800 described with respect to FIG. 8A .
- a check may be performed as to whether the new task (or task assortment) is received. If no new tasks are received, the method may proceed to step 834 . If new tasks are received that require controller repartitioning, such as for example, when exploration robotic device may need to perform visual recognition tasks when stationary, the method may proceed to step 838 .
- current partition configuration (e.g., input parameter, state variables, neuronal composition, connection map, learning parameter values and/or rules, and/or other information associated with the current partition configuration) may be saved in a nonvolatile memory.
- step 840 the controller state and partition configurations may reset and the method proceeds to step 832 , where a new partition set may be configured in accordance with the new task assortment received at step 836 .
- Method 800 of FIG. 8B may enable, inter alia, dynamic partition reconfiguration as illustrated in FIGS. 5B , 6 A- 6 B, supra.
- FIG. 18 illustrates exemplary data flow of automatic determination of eligibility traces for use with spiking neuron networks (e.g., the network 600 of FIG. 6A ), in accordance with one or more implementations.
- the state vector q describing dynamic model of the neuron, may be provided.
- the state vector may comprise membrane voltage and/or current and may be provided as user input, in some implementations.
- partial derivatives of the state functions of Eqn. 61 may be determined as:
- Jacobian matrices J V (q), J R (q), J c (q), associated with the respective dynamic neuronal model may be constructed.
- the Jacobian matrices may be determined according to Eqn. 23.
- the Jacobian matrices may be determined according to Eqn. 26.
- state traces SP associated with the respective dynamic neuronal model, may be determined.
- instantaneous probability density (IPD) ⁇ (q(t)) of the neuron may be constructed.
- the IPD may be determined according to Eqn. 2-Eqn. 4.
- ⁇ ⁇ ⁇ w i may be determined using the partial derivatives of IPD from the step 1812 and the gradient from step 1818 .
- the instantaneous PDF derivative may be determined using Eqn. 19.
- the exponential stochastic threshold may be implemented using Eqn. 2 and the score function g i may be determined using Eqn. 32.
- the exponential stochastic threshold may be implemented using Eqn. 28 and the score function g, may be determined using Eqn. 17 and Eqn. 29.
- FIG. 19A presents one exemplary implementation of Python script configured to effectuate automatic derivation of eligibility traces of the method 1800 of FIG. 18 .
- the designators #18XX refer to the respective steps of the method 1800 of FIG. 18 , according to one or more implementations.
- FIGS. 19B-19E are a python script which illustrates exemplary object constructs for use with the python script of FIG. 19A .
- the script shown in FIGS. 19B-19E is configured to interface with MATLAB® symbolic computations engine, according to one implementation. It will be appreciated by those skilled in the arts, that various other symbolic computations computer aided design (CAD) tools (e.g., Mathematica, etc.) may be used with the methodology described with respect to FIGS. 18-19E .
- CAD computer aided design
- FIGS. 9A through 17B present performance results obtained during simulation and testing by the Assignee hereof, of exemplary computerized spiking network apparatus configured to implement generalized learning framework described above with respect to FIGS. 3-6B .
- the exemplary apparatus in one implementation, comprises learning block (e.g., the block 420 of FIG. 4 ) that implemented using spiking neuronal network 700 , described in detail with respect to FIG. 7 , supra.
- the average performance (e.g. the function F x,y,r average of Eqn. 33-Eqn. 43) may be determined over a time interval Tav that is configured in accordance with the specific application.
- the spike rate of the network output y(t) may be configured between 5 and 100 Hz.
- the Tav may be configured to exceed the spike rate of output by a factor of 5 to 10000.
- the spike rate may comprise 70 Hz output, and the averaging time may be selected at about 100 s.
- the supervised learning cost function may comprise a product of the desired spiking pattern y d (t) (belonging to a particular speaker) with filtered output spike train y(t).
- the F sup may be computed using the following expression:
- a composite cost function for simultaneous reinforcement and supervised learning may be constructed using a linear combination of contributions provided by Eqn. 63 and Eqn. 64:
- the spiking neuron network (e.g., the network 700 of FIG. 7 ) may be configured to maximize the combined cost function F sr using one or more of the methodologies described in a co-owned and co-pending U.S. patent application entitled “APPARATUS AND METHODS FOR IMPLEMENTING LEARNING RULES USING PROBABILISTIC SPIKING NEURAL NETWORKS” filed contemporaneously herewith, and incorporated supra.
- FIGS. 9A-9F present data related to simulation results of the spiking network (e.g., the network 700 of FIG. 7 ) configured in accordance with supervised and reinforcement rules described with respect to Eqn. 65, supra.
- the input into the network e.g., the neuron 730 of FIG. 7
- the panel 900 of FIG. 9A may comprise a single 100-dimensional input spike stream of length 600 ms.
- the horizontal axis denotes elapsed time in milliseconds
- the vertical axis denotes each input dimension (e.g., the connection 714 in FIG. 7 )
- each row corresponds to the respective connection
- dots denote individual spikes within each row.
- FIG. 9A illustrates supervisor signal, comprising a sparse 600 ms-long stream of training spikes, delivered to the neuron 730 via the connection 704 , in FIG. 7 .
- Each dot in the panel 902 denotes the desired output spike y d (t).
- the reinforcement signal may be provided to the neuron according to the following protocol:
- the reinforcement signals 904 , 906 show that the untrained neuron does not receive positive reinforcement (manifested by the absence of spikes in the panel 904 ) and receives two spikes of negative reinforcement (shown by the dots at about 50 ms and about 450 ms in the panel 906 ) because the neuron is quiet during [0 ms-50 ms] interval and it spikes during [400 ms-450 ms] interval.
- panel 900 depicts feed-forward input into the network 700 of FIG. 7 ;
- panel 912 depicts supervisor (training) spiking input;
- panels 914 , 916 depict positive and negative reinforcement input spike patterns, respectively.
- the output of the network shown in the panel 910 displays a better correlation (compared to the output 910 in FIG. 9A ) of the network with the supervisor input.
- Data shown in FIG. 9B confirm that while the network learns to repeat the supervisor spike pattern it fails to perform reinforcement task (receives 3 negative spikes—maximum possible reinforcement).
- panel 900 depicts feed-forward input into the network
- panel 922 depicts supervisor (training) spiking input
- panels 924 , 926 depict positive and negative reinforcement input spike patterns, respectively.
- the output of the network shown in the panel 920 , displays no visible correlation with the supervisor input, as expected. At the same time, network receives maximum possible reinforcement (one positive spike and no negative spikes) illustrated by the data in panels 924 , 926 in FIG. 9C .
- panel 900 depicts feed-forward input into the network
- panel 932 depicts supervisor (training) spiking input
- panels 934 , 936 depict positive and negative reinforcement input spike patterns, respectively.
- the output of the network shown in the panel 930 displays a better correlation (compared to the output 910 in FIG. 9A ) of the network with the supervisor input.
- Data presented in FIG. 9D show that network receives maximum possible reinforcement (panel 934 , 936 ) and begins starts to reproduce some of the supervisor spikes (at around 400 ms and 470 ms) when these do not contradict with the reinforcement learning signals.
- not all of the supervised spikes are echoed in the network output 930 , and additional spikes are present (e.g., the spike at about 50 ms), compared to the supervisor input 932 .
- the reinforcement traces 944 , 946 of FIG. 9E show that the network receives maximum reinforcement.
- the network output (trace 940 ) contains spikes corresponding to a larger portion of the supervisor input (the trace 942 ) when compared to the data shown by the trace 930 of FIG. 9E , provided the supervisor input does not contradict the reinforcement input.
- not all of the supervised spikes of FIG. 9E are echoed in the network output 940 , and additional spikes are present (e.g., the spike at about 50 ms), compared to the supervisor input 942 .
- the output of the network shown in the panel 950 displays a better correlation with the supervisor input (the panel 952 ), as compared to the output 940 in FIG. 9E .
- the network output ( 950 ) is shown to repeat the supervisor input ( 952 ) event when the latter contradicts with the reinforcement learning signals (traces 954 , 956 ).
- the reinforcement data ( 956 ) of FIG. 9F show that while the network receive maximum possible reinforcement (trace 954 ), it is penalized (negative spike at 450 ms on trace 956 ) for generating output that is inconsistent with the reinforcement rules.
- F unsup ln( p ( t )) ⁇ ln( p d ( t )) (Eqn. 67) where p(t) is the probability of the actual spiking pattern generated by the network, and p d (t) is the probability of a spiking pattern generated by Poisson process.
- the composite cost function for simultaneous unsupervised and supervised learning may be expressed as a linear combination of Eqn. 63 and Eqn. 67:
- data related to simulation results of the spiking network 700 may be configured in accordance with supervised and unsupervised rules described with respect to Eqn. 68, supra.
- the input into the neuron 730 is shown in the panel 1000 of FIG. 10A-10C and may comprise a single 100-dimensional input spike stream of length 600 ms.
- the horizontal axis denotes elapsed time in ms
- the vertical axis denotes each input dimension (e.g., the connection 714 in FIG. 7 )
- dots denote individual spikes.
- the panel 1002 in FIG. 10A illustrates supervisor signal, comprising a sparse 600 ms-long stream of training spikes, delivered to the neuron 730 via the connection 704 of FIG. 7 .
- Dot in the panel 1002 denotes the desired output spike y d (t).
- the output activity (the spikes y(t)) of the network shows that the network successfully repeats the supervisor spike pattern which does not behave as a Poisson process with 60 Hz firing rate.
- the output activity of the network illustrated in the panel 1020 of FIG. 10B , shows that the network successfully repeats the supervisor spike pattern 1022 and further comprises additional output spikes are randomly distributed and the total number of spikes is consistent with the desired firing rate.
- the output activity of the network 700 illustrated in the panel 1030 of FIG. 10B , shows that the network output is characterized by the desired Poisson distribution and the network tries to repeat the supervisor pattern, as shown by the spikes denoted with circles in the panel 1030 of FIG. 10C .
- panel 1100 depicts the input comprising a single 100-dimensional input spike stream of length 600 ms; panel 902 depicts the supervisor input; and panels 904 , 906 depict positive and negative reinforcement inputs into the network 700 of FIG. 7 , respectively.
- the network output comprises spikes that generated based on (i) reinforcement learning (the first spike at 50 ms leads to the positive reinforcement spike at 60 ms in the panel 1104 ); (ii) supervised learning (e.g., spikes between 400 ms and 500 ms interval); and (iii) random activity spikes due to unsupervised learning (e.g., spikes between 100 ms and 200 ms interval).
- reinforcement learning the first spike at 50 ms leads to the positive reinforcement spike at 60 ms in the panel 1104
- supervised learning e.g., spikes between 400 ms and 500 ms interval
- random activity spikes due to unsupervised learning e.g., spikes between 100 ms and 200 ms interval.
- FIG. 12 presents to simulation results of the spiking network (e.g., the network 700 of FIG. 7 ) configured in accordance with supervised learning rule.
- the costs function comprising a measure between the desired y d (t) and the actual y(t) output spike train of the neuron, may be determined using low-pass filtering of the desired and actual spike trains, as follows:
- ⁇ y ⁇ ( t ) ⁇ i ⁇ ⁇ ⁇ ⁇ ( t - t i out )
- ⁇ ⁇ y d ⁇ ( t ) ⁇ j ⁇ ⁇ ⁇ ⁇ ( t - t j d )
- x is the input signal
- ⁇ is the Dirac delta function
- t i out , t j d are the output and desired spike times, respectively
- a(t), b(t) are some positive finite-response kernels.
- the results, depicted in FIG. 12 correspond to the input signal ⁇ X ⁇ comprised of 100 input spike trains (shown in the panel 1200 in FIG. 12 ); the network is able to reproduce the desired spike train (shown in the panel 1202 ), as illustrated by very close match of the network output shown in the panel 1210 .
- the generalized learning framework described herein is not limited to the applications characterized by an immediate correspondence between the network activity and the cost function.
- the actual output spike train y(t) may be low-pass filtered via a convolution with the alpha-function kernel.
- This may create an output trace that reaches a maximum value after a time delay of ⁇ d from the output spike occurrence. Subsequently, the value of the filtered trace may be evaluated at the desired spike times t j d .
- the resulting cost function F may reach its maximum when spike output precede desired output (i.e., supervisory input) by the time interval of ⁇ d . Accordingly, the network learns to maximize the cost function of Eqn. 73 in order to predict the desired spikes with the exact delay.
- the predictive supervised learning may be used, in some implementations, for a variety of prediction tasks such as, for example, building forward models.
- FIG. 13 presents simulation results of the spiking network (e.g., the network 700 of FIG. 7 ) configured in accordance with predictive supervised learning rule of Eqn. 73.
- the results depicted in FIG. 13 correspond to the input signal ⁇ X ⁇ comprised of 100 input spike trains (shown in the panel 1300 in FIG. 13 ); the network is able to predict the desired spike train (shown in the panel 1302 ), as illustrated close match of the network output shown in the panel 1310 , and indicated by the arrows 1304 in FIG. 13 .
- the network predictive output y(t) is not the exact shifted replica of the desired signal pattern y d (t), as may be seen from comparing data in the panels 1302 , 1310 in FIG. 13 .
- supervised learning may be used to in order to cause pauses in the activity of the neuron prior to generating the desired spikes, also referred to as the reciprocal supervised learning.
- This task may be formalized as minimization of the performance function F of Eqn. 73 with a constant non-associative potentiation of the synaptic weights (e.g., the weights 712 in FIG. 7 .
- the non-associative potentiation may lead to a gradual increase of the firing rate of the neuron (performing exploration), while the associative minimization of the function F may cause pauses with a certain delay before the supervised spike, as illustrated in FIG. 14 .
- the network output (shown in the panel 1402 ) comprises periods of inactivity (pauses) denoted by broken lines 1406 _ 1 , 1406 _ 2 in FIG. 14 , preceding network output pulse associated with respective desired output pulse of the training signal y d (t) (the association being indicated by the arrows 1404 in FIG. 14 ).
- multiple consecutive pulses in the training signal y d (t) cause longer period of inactivity, as indicated by the inactivity period 1406 _ 1 , as compared to the inactivity period 1406 _ 1 , that is associated with a single pulse in the training signal y d (t).
- the learning framework of the disclosure may be applied to unsupervised learning where the cost function F is configured based on inputs x and outputs y(t) of the network (e.g., the network 700 of FIG. 7 ).
- x )/ p ( y )) x,y F ( x,y ) h ( y ) ⁇ h ( y
- FIG. 15A presents simulation results of the spiking network (e.g., the network 700 of FIG. 7 ) configured in accordance with unsupervised learning rule of Eqn. 74.
- the results depicted in FIG. 15A correspond to the input signal ⁇ X ⁇ comprised of 100 input spike trains (shown in the panel 1500 in FIG. 15 ).
- the network output (shown in the panel 1510 ) show that neurons activity does not decay to zero and does not increase uncontrollably, so that the neuron is capable of extracting information from the input 1500 .
- the panel 1502 illustrates evolution of weights during learning. While some of the weights shown in the panel 1502 exhibit continuing growth, the resulting activity of the network (panel 1510 ) is not adversely affected, as the average of the weights remains constrained.
- the performance function may be configured to minimize the Kullback-Leibler divergence (D KL ) between the output spike train distribution p(y) and the desired probability distribution p d (y).
- the desired output y d (t) is characterized as the Poisson point process with average firing rate r and p d (y) comprises Poisson distribution.
- This configuration of results in a constrained average firing rate of the neuron (e.g., the neuron 730 in FIG. 7 ), thereby preventing the weights from growing infinitely large.
- Poisson distribution of the desired output causes output y(t) with exponential distribution of inter-spike intervals (ISI).
- ISI inter-spike intervals
- the Poisson process is the point process with the largest entropy for that particular firing rate and therefore it is the most informative point process. This means that minimizing the Kullback-Leibler divergence between the output distribution and the Poisson distribution causes maximization of information transmission by the network subject to the firing rate (energy) constraint.
- FIGS. 15B-15C present simulation results of the spiking network (e.g., the network 700 of FIG. 7 ) configured in accordance with unsupervised learning rule of Eqn. 75, corresponding to the input signal ⁇ X ⁇ comprised of 100 input spike trains, (shown in the panel 1540 in FIG. 15C ).
- Each input spike train in the panel 1540 is characterized by Poisson distribution with 50-Hz average firing rate.
- Network averaged performance and weight evolution with time are shown in the panels 1530 , 1532 , of FIG. 15B , respectively.
- the performance shown in FIG. 15B corresponds to maximization of negative divergence, hence the performance increases over time.
- the weight evolution in the panel 1532 illustrates weight stabilization after the best (stable) performance is achieved.
- the KL-divergence minimization ensures that weight do not grow substantially after the desired performance (average filing rate) is achieves. Accordingly, firing rate of the neurons (panel 1542 in FIG. 15C ) remains controlled and does not increase infinitely with time.
- One or more implementations of reinforcement learning may require solving adaptive control task (e.g., AUV/UAV navigation) without having detailed prior information about the dynamics of the controlled plant (e.g., the plant 514 in FIG. 5 ).
- the reinforcement signal e.g., the signal 504 in FIG. 5
- the adaptive controller e.g., the controller 520 of FIG. 5 .
- FIG. 16 illustrates operation of the neural network of FIG. 7 , configured to control navigation of an autonomous unmanned vehicle (AUV) along a trajectory using reinforcement learning.
- Data in the top panel 1600 illustrate mean distance ⁇ d> between the actual position of an AUV y(t) and the desired position of the AUV.
- Data in the bottom panel 1610 present variance of the distance d.
- Data in FIG. 16 illustrate improved network operation, characterized by a decrease of the position mean error and variance with time due to learning rules that enable provides minimization of average costs (maximization of average performance).
- FIGS. 17A-17B illustrate operation of the neural network of FIG. 7 , configured to implement coincidence detector.
- the neuron 730 of FIG. 7 may be configured to receive two spiking inputs, presented by the traces 1700 - 1 , 1700 _ 2 in FIG. 17A .
- the output of the network and the reward signal are presented in the panels 1710 , 1702 in FIG. 17A , respectively.
- the neuron 730 may be configured to generate an output spike only when it receives two input spikes simultaneously and remain silent otherwise.
- coincidence detection e.g., it spikes at the right time, as illustrated for example by the arrow 1712 in FIG.
- the reinforcement 17A receives a positive reward spike (reinforcement equals to one, illustrated by the arrow 1704 in FIG. 17A ). If a neuron does not generate a spike when the two input spikes are present, or if it spikes after only one input, the reinforcement may be negative (e.g., reinforcement equals to zero).
- the learning process is depicted in FIG. 17B , where the panel 1720 displays performance, determined by an average normalized reinforcement through epoch (epoch equals to 250 s); and the panel 1730 presents evolution of weights during learning.
- Performance measure in FIG. 17B is configured as a ratio of correct coincidence detections by neuron and it ranges from 0.5 (corresponding to random guesses: no detection), to 1.0, when each output spike is associated with coincident input (perfect detection).
- the performance gradually increases due partly to increase in weights and after about 2000 s weight change stabilizes and performance remains within a range between 0.75 and 0.95.
- the network operation illustrated in FIG. 17B is advantageously enabled by learning rules that provide minimization of average costs (maximization of average performance).
- Generalized learning framework apparatus and methods of the disclosure may allow for an improved implementation of single adaptive controller apparatus system configured to simultaneously perform a variety of control tasks (e.g., adaptive control, classification, object recognition, prediction, and/or clasterisation).
- control tasks e.g., adaptive control, classification, object recognition, prediction, and/or clasterisation
- the generalized learning framework of the present disclosure may enable adaptive controller apparatus, comprising a single spiking neuron, to implement different learning rules, in accordance with the particulars of the control task.
- the network may be configured and provided to end users as a “black box”. While existing approaches may require end users to recognize the specific learning rule that is applicable to a particular task (e.g., adaptive control, pattern recognition) and to configure network learning rules accordingly, a learning framework of the disclosure may require users to specify the end task (e.g., adaptive control). Once the task is specified within the framework of the disclosure, the “black-box” learning apparatus of the disclosure may be configured to automatically set up the learning rules that match the task, thereby alleviating the user from deriving learning rules or evaluating and selecting between different learning rules.
- each learning task is typically performed by a separate network (or network partition) that operate task-specific (e.g., adaptive control, classification, recognition, prediction rules, etc.) set of learning rules (e.g., supervised, unsupervised, reinforcement).
- task-specific e.g., adaptive control, classification, recognition, prediction rules, etc.
- Learning rules e.g., supervised, unsupervised, reinforcement.
- Unused portions of each partition e.g., motor control partition of a robotic device
- generalized learning framework of the disclosure may allow dynamic re-tasking of portions of the network (e.g., the motor control partition) at performing other tasks (e.g., visual pattern recognition, or object classifications tasks).
- Such functionality may be effected by, inter alia, implementation of generalized learning rules within the network which enable the adaptive controller apparatus to automatically use a new set of learning rules (e.g., supervised learning used in classification), compared to the learning rules used with the motor control task.
- Generalized learning methodology described herein may enable different parts of the same network to implement different adaptive tasks (as described above with respect to FIGS. 5B-5C ).
- the end user of the adaptive device may be enabled to partition network into different parts, connect these parts appropriately, and assign cost functions to each task (e.g., selecting them from predefined set of rules or implementing a custom rule).
- the user may not be required to understand detailed implementation of the adaptive system (e.g., plasticity rules and/or neuronal dynamics) nor is he required to be able to derive the performance function and determine its gradient for each learning task. Instead, the users may be able to operate generalized learning apparatus of the disclosure by assigning task functions and connectivity map to each partition.
- an adaptive system configured in accordance with the present disclosure (e.g., the network 600 of FIG. 6A or 700 of FIG. 7 ) may be capable of learning the desired task without requiring separate learning stage.
- learning may be turned off and on, as appropriate, during system operation without requiring additional intervention into the process of input-output signal transformations executed by signal processing system (e.g., no need to stop the system or change signals flow.
- the generalized learning apparatus of the disclosure may be implemented as a software library configured to be executed by a computerized neural network apparatus (e.g., containing a digital processor).
- the generalized learning apparatus may comprise a specialized hardware module (e.g., an embedded processor or controller).
- the spiking network apparatus may be implemented in a specialized or general purpose integrated circuit (e.g., ASIC, FPGA, and/or PLD). Myriad other implementations may exist that will be recognized by those of ordinary skill given the present disclosure.
- the present disclosure can be used to simplify and improve control tasks for a wide assortment of control applications including, without limitation, industrial control, adaptive signal processing, navigation, and robotics.
- Exemplary implementations of the present disclosure may be useful in a variety of devices including without limitation prosthetic devices (such as artificial limbs), industrial control, autonomous and robotic apparatus, HVAC, and other electromechanical devices requiring accurate stabilization, set-point control, trajectory tracking functionality or other types of control.
- Examples of such robotic devices may include manufacturing robots (e.g., automotive), military devices, and medical devices (e.g., for surgical robots).
- Examples of autonomous navigation may include rovers (e.g., for extraterrestrial, underwater, hazardous exploration environment), unmanned air vehicles, underwater vehicles, smart appliances (e.g., ROOMBA®), and/or robotic toys.
- the present disclosure can advantageously be used in other applications of adaptive signal processing systems (comprising for example, artificial neural networks), including: machine vision, pattern detection and pattern recognition, object classification, signal filtering, data segmentation, data compression, data mining, optimization and scheduling, complex mapping, and/or other applications.
Abstract
Description
where λ(t) represents an instantaneous probability density (“hazard”) of firing.
λ(t)=λo e κ(u(t)-θ) (Eqn. 2)
where u(t) is the membrane voltage of the neuron, θ is the voltage threshold for generating a spike, x is the probabilistic parameter, and λ0 is the basic (spontaneous) firing rate of the neuron.
or an exponential-linear stochastic threshold:
λ(t)=λ0 ln(1+e κ(u(t)-θ)) (Eqn. 4)
where λ0, κ, θ are parameters with a similar meaning to the parameters in the exponential threshold model Eqn. 2.
Λ(u(t))=1−e −λ(u(t))Δt (Eqn. 5)
where Δt is time step length.
where: is a vector of internal state variables (e.g., comprising membrane voltage); Iext is external input to the neuron; F—is the function that defines evolution of the state variables; G describes the interaction between the input current and the state variables (for example, to model synaptic depletion); and R describes resetting the state variables after the output spikes at tout.
{right arrow over (q)}≡u(t);V({right arrow over (q)})=−Cu;R({right arrow over (q)})=u res −u;G({right arrow over (q)})=1, (Eqn. 7)
where C is a membrane constant, and ures is the value to which voltage is set after output spike (reset value). Accordingly, Eqn. 6 becomes:
and a, b, c, d are parameters of the model.
P=p(y|x,w) (Eqn. 11)
In Eqn. 11, the parameter w may denote various system parameters including connection efficacy, firing threshold, resting potential of the neuron, and/or other parameters. The analytical relationship of Eqn. 1 may be selected such that the gradient of ln [p(y|x,w)] with respect to the system parameter w exists and can be calculated. The framework shown in
is the per-stimulus entropy of the system response (or ‘surprisal’). The probability of the external signal p(r|x, y) may be characteristic of the external environment and may not change due to adaptation. That property may allow omission of averaging over external signals r in subsequent consideration of learning rules.
may be calculated for individual spiking neurons parameters to be changed. If spiking patterns are viewed on finite interval length T as an input x and output y of the neuron, then the score function may take the following form:
where time moments tl belong to neuron's output pattern yT (neuron generates spike at these time moments).
where tl is times of output spikes and δ(t) is a delta function.
where tlεyT denotes time steps when neuron generated a spike.
where tl is the times of output spikes and δd(t) is the Kronecker delta.
may be calculated, which is a derivative of the instantaneous probability density with respect to a learning parameter wi of the i-th neuron. Without loss of generality, two cases of learning are considered below: input weights learning (synaptic plasticity) and stochastic threshold tuning (intrinsic plasticity). A derivative of other less common parameters of the neuron model (e.g., membrane, synaptic dynamic, and/or other constants) may be calculated.
where: i is the index of the input channel; xi is the stream of input spikes on the i-th channel; ti j is the times of input spikes in the i-th channel; wi is the weight of the i-th channel; and ε(t) is a generic function that models post-synaptic currents from input spikes. In some implementations, the post-synaptic current function may be configured as: ε(t)≡δ(t), ε(t)≡e−t/t
is a vector of derivatives of instantaneous probability density with respect to the state variable; and
S i(t)=∇w
is the gradient of the neuron internal state with respect to the ith weight (also referred to as the i-th state eligibility trace). In order to determine the state eligibility trace of Eqn. 20 for generalized neuronal model, such as, for example, described by equations Eqn. 6 and Eqn. 18, derivative with respect to the learning weight wi may be determined as:
Where JF, JR, JG are Jacobian matrices of the respective evolution functions V, R, G.
J V =−C; J R=−1; G({right arrow over (q)})=1; J G=0, (Eqn. 23)
so Eqn. 22 for the i-th state eligibility trace may take the following form:
where uw
where α(t) is the post-synaptic potential (PSP) from the jth input spike.
The IZ neuronal model may further be characterized using two first-order nonlinear differential equations describing time evolution of synaptic weights associated with each input interface (e.g., pre-synaptic connection) of a neuron, in the following form:
When using the exponential stochastic threshold configured as:
λ=λ0 e κ(v(t)-θ), (Eqn. 28)
then the derivative of the IPD for IZ neuronal neuron becomes:
for IF neuron becomes:
Combining Eqn. 30 with Eqn. 15 and Eqn. 17 we obtain score function values for the stochastic Integrate-and-Fire neuron in continuous time-space as:
and in discrete time:
F(t)=r(t), (Eqn. 33)
where signal r provides reward and/or punishment signals from the external environment. By way of illustration, a mobile robot, controlled by spiking neural network, may be configured to collect resources (e.g., clean up trash) while avoiding obstacles (e.g., furniture, walls). In this example, the signal r may comprise a positive indication (e.g., representing a reward) at the moment when the robot acquires the resource (e.g., picks up a piece of rubbish) and a negative indication (e.g., representing a punishment) when the robot collides with an obstacle (e.g., wall). Upon receiving the reinforcement signal r, the spiking neural network of the robot controller may change its parameters (e.g., neuron connection weights) in order to maximize the function F (e.g., maximize the reward and minimize the punishment).
F(t)=d(y(t),y d(t)), (Eqn. 34)
where y is the output of the control block (e.g., the
F(t)=(y(t)−y d(t))2. (Eqn. 35)
F=[(y*α)−(y d*β)]2, (Eqn. 36)
where α, β are finite impulse response kernels. In some implementations, the distance measure may utilize the mutual information between the output signal and the reference signal.
F=[ƒ(y)−ƒd(y)]2, (Eqn. 37)
where ƒ is a function configured to extract the characteristic (or characteristics) of interest from the output signal y. By way of example, useful with spiking output signals, the characteristic may correspond to a firing rate of spikes and the function ƒ(y) may determine the mean firing from the output. In some implementations, the desired characteristic value may be provided through the external signal as
r=ƒ d(y). (Eqn. 38)
In some implementations, the ƒd(y) may be calculated internally by the PD block.
F=i(x,y)=−ln(p(y))+ln(p(y|x), (Eqn. 39)
where p(y) is an unconditioned probability of the current output. It is noteworthy that the average value of the instantaneous mutual information may equal the mutual information I(x,y). This performance function may be used to implement ICA (unsupervised learning).
F=h(x,y)=−ln(p(y)). (Eqn. 40)
where p(y) is an unconditioned probability of the current output. It is noteworthy that the average value of the instantaneous unconditional entropy may equal the unconditional H(x,y). This performance function may be used to reduce variability in the output of the system for adaptive filtering.
F=d KL(x,y)=ln(p(y|x))−ln(Θ(y|x)). (Eqn. 41)
It is noteworthy that the average value of the instantaneous Kulback-Leibler divergence may equal the dKL(p, Θ). The performance function of Eqn. 41 may be applied in unsupervised learning tasks in order to restrict a possible output of the system. For example, if Θ(y) is a Poisson distribution of spikes with some firing rate R, then minimization of this performance function may force the neuron to have the same firing rate R.
F=∥x−A(y,w)∥2 +∥y∥ 2, (Eqn. 42)
where the first term quantifies how close the data x can be described by the current output y, where A(y,w) is a function that describes how to decode an original data from the output. The second term may calculate a norm of the output and may imply restrictions on the output sparseness.
F=C(F 1 ,F 2 , . . . ,F n), (Eqn. 43)
where: F1, F2, . . . , Fn are performance function values for different tasks, and C is a combination function.
C(F 1 ,F 1 , . . . ,F 1)=Σk a k F k, (Eqn. 44)
where ak are combination weights.
F(t)=F(t)cur − F . (Eqn. 45)
In some implementations, the time average of the performance function may comprise an interval average, where learning occurs over a predetermined interval. A current value of the cost function may be determined at individual steps within the interval and may be averaged over all steps. In some implementations, the time average of the performance function may comprise a running average, where the current value of the cost function may be low-pass filtered according to:
thereby producing a running average output.
with y(t) and yd(t) being the actual and desired output spike trains; δ(t) is the Dirac delta function; ti out, tj d are the output and desired spike times, respectively; and a(t), b(t) are positive finite-response kernels. In some implementations, the kernel a(t) may comprise an exponential trace: a(t)=e−t/τ
F(x(t),y(t))=D KL(y(t)∥r(t)). (Eqn. 48)
F(x(t),y(t))=I(y(t),r(t)). (Eqn. 49)
F(x,y)=H(y|x) (Eqn. 50)
so as to provide a more stable neuron output y for a given input x.
Parameter Changing Block
Δ(t)=γF(t)(t), (Eqn. 51)
where γ is the learning rate configured to determine speed of learning adaptation. The learning method implementation according to (Eqn. 51) may be advantageous in applications where the performance function F(t) depends on the current values of the inputs x, outputs y, and/or signal r.
where: T is a finite interval over which the summation occurs; N is the number of steps; and Δt is the time step determined as T/N.
The summation interval T in Eqn. 52 may be configured based on the specific requirements of the control application. By way of illustration, in a control application where a robotic arm is configured to reaching for an object, the interval may correspond to a time from the start position of the arm to the reaching point and, in some implementations, may be about 1 s-50 s. In a speech recognition application, the time interval T may match the time required to pronounce the word being recognized (typically less than 1 s-2 s). In some implementations of spiking neuronal networks, Δt may be configured in range between 1 ms and 20 ms, corresponding to 50 steps (N=50) in one second interval.
{right arrow over (e)}(t+Δt)=β{right arrow over (e)}(t)+{right arrow over (g)}(t), (Eqn. 53)
where β is the decay coefficient. In some implementations, the traces may be determined using differential equations:
The control parameter w may then be adjusted as:
{right arrow over (Δw)}(t)=γF(t){right arrow over (e)}(t), (Eqn. 55)
where γ is the learning rate. The method of Eqn. 53-Eqn. 55 may be appropriate when a performance function depends on current and past values of the inputs and outputs and may be referred to as the OLPOMDP algorithm. While it may be local in time and computationally simple, it may lead to biased estimate of the performance function. By way of illustration, the methodology described by Eqn. 53-Eqn. 55 may be used, in some implementations, in a rescue robotic device configured to locate resources (e.g., survivors, or unexploded ordinance) in a building. The input x may correspond to the robot current position in the building. The reward r (e.g., the successful location events) may depend on the history of inputs and on the history of actions taken by the agent (e.g., left/right turns, up/down movement, etc.).
Δ(t)μΔ(t−Δt)+Δ(t), (Eqn. 56)
where μ is the momentum coefficient. In some implementations, the sign of gradient may be used to perform learning adjustments as follows:
In some implementations, gradient descend methodology may be used for learning coefficient adaptation.
where ({right arrow over (g)}{right arrow over (g)}T)x,y is the Fisher information metric matrix. Applying the following transformation to Eqn. 21:
GΔ{right arrow over (w)}={right arrow over (F)} (Eqn. 60)
for individual parameters of the block. Using the learning architecture described in
denotes the output spike pattern, corresponding to the
denotes the teaching spike pattern, corresponding to the desired (or reference) signal that is part of
denotes the reinforcement signal spike stream, corresponding to signal 304 of
where wiwi represents weights of the input channels, ti k represents input spike times, and α(t)=(t/τα)e1-(t/τ
Exemplary Methods
Generalized Learning Rules
V(q)T=(V 1(q),V 2(q), . . . ,V n(q)),
R(q)T=(R 1(q),R 2(q), . . . ,R n(q)),
G(q)T=(G 1(q),G 2(q), . . . ,G n(q)),q T ={q 0 , . . . ,q n}. (Eqn. 61)
of the IPD with respect to the state vector q may be determined.
may be determined using the partial derivatives of IPD from the
may be determined using, for example, Eqn. 17. In one or more implementations of IF neuronal model, the exponential stochastic threshold may be implemented using Eqn. 2 and the score function gi may be determined using Eqn. 32.
F sr =aF sup +bF reinf,
where Fsup and Freinf are the cost functions for the supervised and reinforcement learning tasks, respectively, and a, b are coefficients determining relative contribution of each cost component to the combined cost function. By varying the coefficients a, b during different simulation runs of the spiking network, effects of relative contribution of each learning method on the network learning performance may be investigated.
where τd is the trace decay constant, and C is the bias constant configured to introduce penalty associated with extra activity of the neuron does not corresponding to the desired spike train.
F reinf =y +(t)−y −(t), (Eqn. 64)
where the subtraction of spike trains is understood as in Eqn. 65. Reinforcement may be generated according to the task that is being solved by the neuron.
Using the description of Eqn. 65, the spiking neuron network (e.g., the
-
- If the network (e.g., the
network 700 ofFIG. 7 ) generates one spike within a [0.50 ms] time window from the onset of the input, then it may receive the positive reinforcement spike, illustrated in thepanel 904 inFIG. 9A . - If the network does not generate outputs during that interval or generates more than one spike, then it may receive negative reinforcement spike, illustrated in the
panel 906 inFIG. 9A . - If the network is active (generates output spikes) during time intervals [200 ms, 250 ms] and [400 ms, 450 ms], then it may receive negative reinforcement spike.
- Reinforcement signals may not be generated during one or more other intervals.
A maximum reinforcement configuration may comprise (i) one positive reinforcement spike and (ii) no negative reinforcement spikes. A maximum negative reinforcement configuration may comprise (i) no positive reinforcement spikes and (ii) three negative reinforcement spikes.
- If the network (e.g., the
F su =aF sup +c(−F unsup). (Eqn. 66)
where Fsup is described by, for example, Eqn. 34, Funsup is the cost function for the unsupervised learning tasks, and a, c are coefficients determining relative contribution of each cost component to the combined cost function. By varying the coefficients a, c during different simulation runs of the spiking network, effects of relative contribution of individual learning methods on the network learning performance may be investigated.
F unsup=ln(p(t))−ln(p d(t)) (Eqn. 67)
where p(t) is the probability of the actual spiking pattern generated by the network, and pd(t) is the probability of a spiking pattern generated by Poisson process. The unsupervised learning task in this implementation may serve to minimize the function of Eqn. 67 such that when the two probabilities p(t)=pd(t) are equal at all times, then the network may generate output spikes according to Poisson distribution.
F sur =aF sup +bF reinf +c(−F unsup) (Eqn. 69)
x is the input signal, δ is the Dirac delta function, ti out, tj d are the output and desired spike times, respectively, and a(t), b(t) are some positive finite-response kernels. In one or more implementations, one or both kernels (e.g., a(t)) may comprise an exponential trace:
a(t)=e −t/τ
where τa is the time interval, typically selected according to average output firing rate of a neuron.
F(x,y)=∫T(∫−∞ t y(s)α(t−s)ds)Σjδ(t−t j d)dt (Eqn. 73)
where α(t) is the alpha function. As shown by Eqn. 73, first, the actual output spike train y(t) may be low-pass filtered via a convolution with the alpha-function kernel. This may create an output trace that reaches a maximum value after a time delay of τd from the output spike occurrence. Subsequently, the value of the filtered trace may be evaluated at the desired spike times tj d. The resulting cost function F may reach its maximum when spike output precede desired output (i.e., supervisory input) by the time interval of τd. Accordingly, the network learns to maximize the cost function of Eqn. 73 in order to predict the desired spikes with the exact delay. The predictive supervised learning may be used, in some implementations, for a variety of prediction tasks such as, for example, building forward models.
F =I(x,y)=ln(p(y|x)/p(y)) x,y F(x,y)=h(y)−h(y|x) (Eqn. 74)
where h(y) is the unconditional per-stimulus entropy (surprisal), described by (Eqn. 13). Learning by the
F =D KL(p∥p d)=ln(p(y)/p d(y)) x,y F(x,y)=h d(y)−h(y) (Eqn. 75)
Claims (21)
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/487,533 US9146546B2 (en) | 2012-06-04 | 2012-06-04 | Systems and apparatus for implementing task-specific learning using spiking neurons |
US13/489,280 US8943008B2 (en) | 2011-09-21 | 2012-06-05 | Apparatus and methods for reinforcement learning in artificial neural networks |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US13/487,533 US9146546B2 (en) | 2012-06-04 | 2012-06-04 | Systems and apparatus for implementing task-specific learning using spiking neurons |
Publications (2)
Publication Number | Publication Date |
---|---|
US20130325768A1 US20130325768A1 (en) | 2013-12-05 |
US9146546B2 true US9146546B2 (en) | 2015-09-29 |
Family
ID=49671525
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US13/487,533 Expired - Fee Related US9146546B2 (en) | 2011-09-21 | 2012-06-04 | Systems and apparatus for implementing task-specific learning using spiking neurons |
Country Status (1)
Country | Link |
---|---|
US (1) | US9146546B2 (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20170189685A1 (en) * | 2015-12-30 | 2017-07-06 | Boston Scientific Neuromodulation Corporation | Method and apparatus for composing spatio-temporal patterns of neurostimulation using a neuronal network model |
WO2018045021A1 (en) * | 2016-09-01 | 2018-03-08 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
CN108985447A (en) * | 2018-06-15 | 2018-12-11 | 华中科技大学 | A kind of hardware pulse nerve network system |
US10183167B2 (en) | 2015-12-30 | 2019-01-22 | Boston Scientific Neuromodulation Corporation | Method and apparatus for composing spatio-temporal patterns of neurostimulation for cumulative effects |
US10252059B2 (en) | 2015-12-30 | 2019-04-09 | Boston Scientific Neuromodulation Corporation | Method and apparatus for guided optimization of spatio-temporal patterns of neurostimulation |
US20190236482A1 (en) * | 2016-07-18 | 2019-08-01 | Google Llc | Training machine learning models on multiple machine learning tasks |
US10650307B2 (en) | 2016-09-13 | 2020-05-12 | International Business Machines Corporation | Neuromorphic architecture for unsupervised pattern detection and feature learning |
US10657426B2 (en) * | 2018-01-25 | 2020-05-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US10755167B2 (en) | 2016-06-22 | 2020-08-25 | International Business Machines Corporation | Neuromorphic architecture with multiple coupled neurons using internal state neuron information |
US10839316B2 (en) | 2016-08-08 | 2020-11-17 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using inertial auto-encoders |
US10839302B2 (en) | 2015-11-24 | 2020-11-17 | The Research Foundation For The State University Of New York | Approximate value iteration with complex returns by bounding |
US11353833B2 (en) | 2016-08-08 | 2022-06-07 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
RU2784191C1 (en) * | 2021-12-27 | 2022-11-23 | Андрей Павлович Катанский | Method and apparatus for adaptive automated control of a heating, ventilation and air conditioning system |
US11524401B1 (en) | 2019-03-28 | 2022-12-13 | Apple Inc. | Learning skills from video demonstrations |
US11568236B2 (en) | 2018-01-25 | 2023-01-31 | The Research Foundation For The State University Of New York | Framework and methods of diverse exploration for fast and safe policy improvement |
Families Citing this family (108)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9405975B2 (en) | 2010-03-26 | 2016-08-02 | Brain Corporation | Apparatus and methods for pulse-code invariant object recognition |
US9122994B2 (en) | 2010-03-26 | 2015-09-01 | Brain Corporation | Apparatus and methods for temporally proximate object recognition |
US9311593B2 (en) | 2010-03-26 | 2016-04-12 | Brain Corporation | Apparatus and methods for polychronous encoding and multiplexing in neuronal prosthetic devices |
US8315305B2 (en) | 2010-03-26 | 2012-11-20 | Brain Corporation | Systems and methods for invariant pulse latency coding |
US9906838B2 (en) | 2010-07-12 | 2018-02-27 | Time Warner Cable Enterprises Llc | Apparatus and methods for content delivery and message exchange across multiple content delivery networks |
US9193075B1 (en) | 2010-08-26 | 2015-11-24 | Brain Corporation | Apparatus and methods for object detection via optical flow cancellation |
US8775341B1 (en) | 2010-10-26 | 2014-07-08 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US9015093B1 (en) | 2010-10-26 | 2015-04-21 | Michael Lamport Commons | Intelligent control with hierarchical stacked neural networks |
US11279025B2 (en) * | 2011-06-02 | 2022-03-22 | Brain Corporation | Apparatus and methods for operating robotic devices using selective state space training |
US9566710B2 (en) | 2011-06-02 | 2017-02-14 | Brain Corporation | Apparatus and methods for operating robotic devices using selective state space training |
US9147156B2 (en) | 2011-09-21 | 2015-09-29 | Qualcomm Technologies Inc. | Apparatus and methods for synaptic update in a pulse-coded network |
US8990133B1 (en) | 2012-12-20 | 2015-03-24 | Brain Corporation | Apparatus and methods for state-dependent learning in spiking neuron networks |
US9047568B1 (en) | 2012-09-20 | 2015-06-02 | Brain Corporation | Apparatus and methods for encoding of sensory data using artificial spiking neurons |
US9070039B2 (en) | 2013-02-01 | 2015-06-30 | Brian Corporation | Temporal winner takes all spiking neuron network sensory processing apparatus and methods |
US8725658B2 (en) | 2011-09-21 | 2014-05-13 | Brain Corporation | Elementary network description for efficient memory management in neuromorphic systems |
US9460387B2 (en) | 2011-09-21 | 2016-10-04 | Qualcomm Technologies Inc. | Apparatus and methods for implementing event-based updates in neuron networks |
US8725662B2 (en) | 2011-09-21 | 2014-05-13 | Brain Corporation | Apparatus and method for partial evaluation of synaptic updates based on system events |
US9104973B2 (en) | 2011-09-21 | 2015-08-11 | Qualcomm Technologies Inc. | Elementary network description for neuromorphic systems with plurality of doublets wherein doublet events rules are executed in parallel |
US9412064B2 (en) | 2011-08-17 | 2016-08-09 | Qualcomm Technologies Inc. | Event-based communication in spiking neuron networks communicating a neural activity payload with an efficacy update |
US8719199B2 (en) | 2011-09-21 | 2014-05-06 | Brain Corporation | Systems and methods for providing a neural network having an elementary network description for efficient implementation of event-triggered plasticity rules |
US9117176B2 (en) | 2011-09-21 | 2015-08-25 | Qualcomm Technologies Inc. | Round-trip engineering apparatus and methods for neural networks |
US9213937B2 (en) | 2011-09-21 | 2015-12-15 | Brain Corporation | Apparatus and methods for gating analog and spiking signals in artificial neural networks |
US9104186B2 (en) | 2012-06-04 | 2015-08-11 | Brain Corporation | Stochastic apparatus and methods for implementing generalized learning rules |
US9146546B2 (en) | 2012-06-04 | 2015-09-29 | Brain Corporation | Systems and apparatus for implementing task-specific learning using spiking neurons |
US9156165B2 (en) | 2011-09-21 | 2015-10-13 | Brain Corporation | Adaptive critic apparatus and methods |
US9015092B2 (en) | 2012-06-04 | 2015-04-21 | Brain Corporation | Dynamically reconfigurable stochastic learning apparatus and methods |
US9098811B2 (en) | 2012-06-04 | 2015-08-04 | Brain Corporation | Spiking neuron network apparatus and methods |
US10210452B2 (en) | 2011-09-21 | 2019-02-19 | Qualcomm Incorporated | High level neuromorphic network description apparatus and methods |
US9224090B2 (en) | 2012-05-07 | 2015-12-29 | Brain Corporation | Sensory input processing apparatus in a spiking neural network |
US9129221B2 (en) | 2012-05-07 | 2015-09-08 | Brain Corporation | Spiking neural network feedback apparatus and methods |
US9208432B2 (en) | 2012-06-01 | 2015-12-08 | Brain Corporation | Neural network learning and collaboration apparatus and methods |
US9412041B1 (en) | 2012-06-29 | 2016-08-09 | Brain Corporation | Retinal apparatus and methods |
US9256823B2 (en) | 2012-07-27 | 2016-02-09 | Qualcomm Technologies Inc. | Apparatus and methods for efficient updates in spiking neuron network |
US9256215B2 (en) | 2012-07-27 | 2016-02-09 | Brain Corporation | Apparatus and methods for generalized state-dependent learning in spiking neuron networks |
US9186793B1 (en) | 2012-08-31 | 2015-11-17 | Brain Corporation | Apparatus and methods for controlling attention of a robot |
US9440352B2 (en) | 2012-08-31 | 2016-09-13 | Qualcomm Technologies Inc. | Apparatus and methods for robotic learning |
US9311594B1 (en) | 2012-09-20 | 2016-04-12 | Brain Corporation | Spiking neuron network apparatus and methods for encoding of sensory data |
US9367798B2 (en) | 2012-09-20 | 2016-06-14 | Brain Corporation | Spiking neuron network adaptive control apparatus and methods |
US9189730B1 (en) | 2012-09-20 | 2015-11-17 | Brain Corporation | Modulated stochasticity spiking neuron network controller apparatus and methods |
US8793205B1 (en) | 2012-09-20 | 2014-07-29 | Brain Corporation | Robotic learning and evolution apparatus |
US9082079B1 (en) | 2012-10-22 | 2015-07-14 | Brain Corporation | Proportional-integral-derivative controller effecting expansion kernels comprising a plurality of spiking neurons associated with a plurality of receptive fields |
US9111226B2 (en) | 2012-10-25 | 2015-08-18 | Brain Corporation | Modulated plasticity apparatus and methods for spiking neuron network |
US9218563B2 (en) | 2012-10-25 | 2015-12-22 | Brain Corporation | Spiking neuron sensory processing apparatus and methods for saliency detection |
US9183493B2 (en) * | 2012-10-25 | 2015-11-10 | Brain Corporation | Adaptive plasticity apparatus and methods for spiking neuron network |
US9275326B2 (en) | 2012-11-30 | 2016-03-01 | Brain Corporation | Rate stabilization through plasticity in spiking neuron network |
US9123127B2 (en) | 2012-12-10 | 2015-09-01 | Brain Corporation | Contrast enhancement spiking neuron network sensory processing apparatus and methods |
US9195934B1 (en) | 2013-01-31 | 2015-11-24 | Brain Corporation | Spiking neuron classifier apparatus and methods using conditionally independent subsets |
US9177245B2 (en) | 2013-02-08 | 2015-11-03 | Qualcomm Technologies Inc. | Spiking network apparatus and method with bimodal spike-timing dependent plasticity |
US9764468B2 (en) | 2013-03-15 | 2017-09-19 | Brain Corporation | Adaptive predictor apparatus and methods |
US8996177B2 (en) | 2013-03-15 | 2015-03-31 | Brain Corporation | Robotic training apparatus and methods |
US9008840B1 (en) | 2013-04-19 | 2015-04-14 | Brain Corporation | Apparatus and methods for reinforcement-guided supervised learning |
US9242372B2 (en) | 2013-05-31 | 2016-01-26 | Brain Corporation | Adaptive robotic interface apparatus and methods |
US9314924B1 (en) * | 2013-06-14 | 2016-04-19 | Brain Corporation | Predictive robotic controller apparatus and methods |
US9384443B2 (en) | 2013-06-14 | 2016-07-05 | Brain Corporation | Robotic training apparatus and methods |
US9792546B2 (en) | 2013-06-14 | 2017-10-17 | Brain Corporation | Hierarchical robotic controller apparatus and methods |
US9239985B2 (en) | 2013-06-19 | 2016-01-19 | Brain Corporation | Apparatus and methods for processing inputs in an artificial neuron network |
US9436909B2 (en) | 2013-06-19 | 2016-09-06 | Brain Corporation | Increased dynamic range artificial neuron network apparatus and methods |
US9552546B1 (en) | 2013-07-30 | 2017-01-24 | Brain Corporation | Apparatus and methods for efficacy balancing in a spiking neuron network |
US9579789B2 (en) | 2013-09-27 | 2017-02-28 | Brain Corporation | Apparatus and methods for training of robotic control arbitration |
US9296101B2 (en) | 2013-09-27 | 2016-03-29 | Brain Corporation | Robotic control arbitration apparatus and methods |
US9489623B1 (en) | 2013-10-15 | 2016-11-08 | Brain Corporation | Apparatus and methods for backward propagation of errors in a spiking neuron network |
US9463571B2 (en) | 2013-11-01 | 2016-10-11 | Brian Corporation | Apparatus and methods for online training of robots |
US9597797B2 (en) | 2013-11-01 | 2017-03-21 | Brain Corporation | Apparatus and methods for haptic training of robots |
US9248569B2 (en) | 2013-11-22 | 2016-02-02 | Brain Corporation | Discrepancy detection apparatus and methods for machine learning |
US9358685B2 (en) | 2014-02-03 | 2016-06-07 | Brain Corporation | Apparatus and methods for control of robot actions based on corrective user inputs |
US9533413B2 (en) | 2014-03-13 | 2017-01-03 | Brain Corporation | Trainable modular robotic apparatus and methods |
US9987743B2 (en) | 2014-03-13 | 2018-06-05 | Brain Corporation | Trainable modular robotic apparatus and methods |
US9364950B2 (en) | 2014-03-13 | 2016-06-14 | Brain Corporation | Trainable modular robotic methods |
US9630317B2 (en) | 2014-04-03 | 2017-04-25 | Brain Corporation | Learning apparatus and methods for control of robotic devices via spoofing |
US9613308B2 (en) | 2014-04-03 | 2017-04-04 | Brain Corporation | Spoofing remote control apparatus and methods |
US9346167B2 (en) | 2014-04-29 | 2016-05-24 | Brain Corporation | Trainable convolutional network apparatus and methods for operating a robotic vehicle |
US10194163B2 (en) | 2014-05-22 | 2019-01-29 | Brain Corporation | Apparatus and methods for real time estimation of differential motion in live video |
US9713982B2 (en) | 2014-05-22 | 2017-07-25 | Brain Corporation | Apparatus and methods for robotic operation using video imagery |
US9939253B2 (en) | 2014-05-22 | 2018-04-10 | Brain Corporation | Apparatus and methods for distance estimation using multiple image sensors |
US9848112B2 (en) | 2014-07-01 | 2017-12-19 | Brain Corporation | Optical detection apparatus and methods |
US10057593B2 (en) | 2014-07-08 | 2018-08-21 | Brain Corporation | Apparatus and methods for distance estimation using stereo imagery |
US9860077B2 (en) | 2014-09-17 | 2018-01-02 | Brain Corporation | Home animation apparatus and methods |
US9821470B2 (en) | 2014-09-17 | 2017-11-21 | Brain Corporation | Apparatus and methods for context determination using real time sensor data |
US9849588B2 (en) | 2014-09-17 | 2017-12-26 | Brain Corporation | Apparatus and methods for remotely controlling robotic devices |
US9579790B2 (en) | 2014-09-17 | 2017-02-28 | Brain Corporation | Apparatus and methods for removal of learned behaviors in robots |
US9870617B2 (en) | 2014-09-19 | 2018-01-16 | Brain Corporation | Apparatus and methods for saliency detection based on color occurrence analysis |
US9630318B2 (en) | 2014-10-02 | 2017-04-25 | Brain Corporation | Feature detection apparatus and methods for training of robotic navigation |
US9881349B1 (en) | 2014-10-24 | 2018-01-30 | Gopro, Inc. | Apparatus and methods for computerized object identification |
US9426946B2 (en) | 2014-12-02 | 2016-08-30 | Brain Corporation | Computerized learning landscaping apparatus and methods |
US9717387B1 (en) | 2015-02-26 | 2017-08-01 | Brain Corporation | Apparatus and methods for programming and training of robotic household appliances |
US9840003B2 (en) | 2015-06-24 | 2017-12-12 | Brain Corporation | Apparatus and methods for safe navigation of robotic devices |
US10197664B2 (en) | 2015-07-20 | 2019-02-05 | Brain Corporation | Apparatus and methods for detection of objects using broadband signals |
US10423879B2 (en) | 2016-01-13 | 2019-09-24 | International Business Machines Corporation | Efficient generation of stochastic spike patterns in core-based neuromorphic systems |
US10295972B2 (en) | 2016-04-29 | 2019-05-21 | Brain Corporation | Systems and methods to operate controllable devices with gestures and/or noises |
KR102399548B1 (en) * | 2016-07-13 | 2022-05-19 | 삼성전자주식회사 | Method for neural network and apparatus perform same method |
US10949737B2 (en) * | 2016-07-13 | 2021-03-16 | Samsung Electronics Co., Ltd. | Method for neural network and apparatus performing same method |
US20180129970A1 (en) * | 2016-11-10 | 2018-05-10 | Justin E. Gottschlich | Forward-looking machine learning for decision systems |
US11853884B2 (en) * | 2017-02-10 | 2023-12-26 | Synaptics Incorporated | Many or one detection classification systems and methods |
GB201702746D0 (en) * | 2017-02-20 | 2017-04-05 | Ocado Innovation Ltd | Vending system and method of automatically vending |
EP3586277B1 (en) * | 2017-02-24 | 2024-04-03 | Google LLC | Training policy neural networks using path consistency learning |
CN107343000A (en) * | 2017-07-04 | 2017-11-10 | 北京百度网讯科技有限公司 | Method and apparatus for handling task |
US10748063B2 (en) * | 2018-04-17 | 2020-08-18 | Hrl Laboratories, Llc | Neuronal network topology for computing conditional probabilities |
CN108562883B (en) * | 2017-12-29 | 2022-06-10 | 南京航空航天大学 | Maximum likelihood distance estimation algorithm of multi-carrier radar system |
DE102018109835A1 (en) * | 2018-04-24 | 2019-10-24 | Albert-Ludwigs-Universität Freiburg | Method and device for determining a network configuration of a neural network |
US11200484B2 (en) | 2018-09-06 | 2021-12-14 | International Business Machines Corporation | Probability propagation over factor graphs |
US10783623B2 (en) | 2018-12-03 | 2020-09-22 | Mistras Group, Inc. | Systems and methods for inspecting pipelines using a robotic imaging system |
US10890505B2 (en) | 2018-12-03 | 2021-01-12 | Mistras Group, Inc. | Systems and methods for inspecting pipelines using a robotic imaging system |
US11143599B2 (en) | 2018-12-03 | 2021-10-12 | Mistras Group, Inc. | Systems and methods for inspecting pipelines using a pipeline inspection robot |
JP7421719B2 (en) * | 2019-01-30 | 2024-01-25 | 国立大学法人 東京大学 | Control devices, control systems, and control programs |
TWI780333B (en) * | 2019-06-03 | 2022-10-11 | 緯創資通股份有限公司 | Method for dynamically processing and playing multimedia files and multimedia play apparatus |
CN110597058B (en) * | 2019-08-28 | 2022-06-17 | 浙江工业大学 | Three-degree-of-freedom autonomous underwater vehicle control method based on reinforcement learning |
CN111736617B (en) * | 2020-06-09 | 2022-11-04 | 哈尔滨工程大学 | Track tracking control method for preset performance of benthonic underwater robot based on speed observer |
CN115213885B (en) * | 2021-06-29 | 2023-04-07 | 达闼科技(北京)有限公司 | Robot skill generation method, device and medium, cloud server and robot control system |
Citations (76)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5063603A (en) | 1989-11-06 | 1991-11-05 | David Sarnoff Research Center, Inc. | Dynamic method for recognizing objects and image processing system therefor |
US5092343A (en) | 1988-02-17 | 1992-03-03 | Wayne State University | Waveform analysis apparatus and method using neural network techniques |
US5245672A (en) | 1992-03-09 | 1993-09-14 | The United States Of America As Represented By The Secretary Of Commerce | Object/anti-object neural network segmentation |
US5355435A (en) | 1992-05-18 | 1994-10-11 | New Mexico State University Technology Transfer Corp. | Asynchronous temporal neural processing element |
US5388186A (en) | 1993-02-01 | 1995-02-07 | At&T Corp. | Differential process controller using artificial neural networks |
US5408588A (en) | 1991-06-06 | 1995-04-18 | Ulug; Mehmet E. | Artificial neural network method and architecture |
US5467428A (en) | 1991-06-06 | 1995-11-14 | Ulug; Mehmet E. | Artificial neural network method and architecture adaptive signal filtering |
US5638359A (en) | 1992-12-14 | 1997-06-10 | Nokia Telecommunications Oy | Method for congestion management in a frame relay network and a node in a frame relay network |
US5673367A (en) | 1992-10-01 | 1997-09-30 | Buckley; Theresa M. | Method for neural network control of motion using real-time environmental feedback |
RU2108612C1 (en) | 1994-09-14 | 1998-04-10 | Круглов Сергей Петрович | Adaptive control system with identifier and implicit reference model |
US5875108A (en) | 1991-12-23 | 1999-02-23 | Hoffberg; Steven M. | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
US6009418A (en) | 1996-05-02 | 1999-12-28 | Cooper; David L. | Method and apparatus for neural networking using semantic attractor architecture |
US6014653A (en) | 1996-01-26 | 2000-01-11 | Thaler; Stephen L. | Non-algorithmically implemented artificial neural networks and components thereof |
EP1089436A2 (en) | 1999-09-29 | 2001-04-04 | Lucent Technologies Inc. | Current-mode spike-based analog-to-digital conversion |
US6363369B1 (en) | 1997-06-11 | 2002-03-26 | University Of Southern California | Dynamic synapse for signal processing in neural networks |
US20020038294A1 (en) | 2000-06-16 | 2002-03-28 | Masakazu Matsugu | Apparatus and method for detecting or recognizing pattern by employing a plurality of feature detecting elements |
US6458157B1 (en) | 1997-08-04 | 2002-10-01 | Suaning Gregg Joergen | Retinal stimulator |
US6545705B1 (en) | 1998-04-10 | 2003-04-08 | Lynx System Developers, Inc. | Camera with object recognition/data output |
US6545708B1 (en) | 1997-07-11 | 2003-04-08 | Sony Corporation | Camera controlling device and method for predicted viewing |
US6546291B2 (en) | 2000-02-16 | 2003-04-08 | Massachusetts Eye & Ear Infirmary | Balance prosthesis |
US6581046B1 (en) | 1997-10-10 | 2003-06-17 | Yeda Research And Development Co. Ltd. | Neuronal phase-locked loops |
US6601049B1 (en) | 1996-05-02 | 2003-07-29 | David L. Cooper | Self-adjusting multi-layer neural network architectures and methods therefor |
US20040193670A1 (en) | 2001-05-21 | 2004-09-30 | Langan John D. | Spatio-temporal filter and method |
US20050015351A1 (en) | 2003-07-18 | 2005-01-20 | Alex Nugent | Nanotechnology neural network methods and systems |
US20050036649A1 (en) | 2001-08-23 | 2005-02-17 | Jun Yokono | Robot apparatus, face recognition method, and face recognition apparatus |
US20050283450A1 (en) | 2004-06-11 | 2005-12-22 | Masakazu Matsugu | Information processing apparatus, information processing method, pattern recognition apparatus, and pattern recognition method |
US20060161218A1 (en) | 2003-11-26 | 2006-07-20 | Wicab, Inc. | Systems and methods for treating traumatic brain injury |
US20070022068A1 (en) | 2005-07-01 | 2007-01-25 | Ralph Linsker | Neural networks for prediction and control |
US20070176643A1 (en) | 2005-06-17 | 2007-08-02 | Alex Nugent | Universal logic gate utilizing nanotechnology |
US20070208678A1 (en) | 2004-03-17 | 2007-09-06 | Canon Kabushiki Kaisha | Parallel Pulse Signal Processing Apparatus, Pattern Recognition Apparatus, And Image Input Apparatus |
US20080024345A1 (en) | 2006-07-27 | 2008-01-31 | Brian Watson | Analog to digital conversion using recurrent neural networks |
JP4087423B2 (en) | 2006-10-17 | 2008-05-21 | 京セラミタ株式会社 | Portable communication device |
US20080162391A1 (en) | 2006-12-29 | 2008-07-03 | Neurosciences Research Foundation, Inc. | Solving the distal reward problem through linkage of stdp and dopamine signaling |
WO2008132066A1 (en) | 2007-04-27 | 2008-11-06 | Siemens Aktiengesellschaft | A method for computer-assisted learning of one or more neural networks |
US20090043722A1 (en) | 2003-03-27 | 2009-02-12 | Alex Nugent | Adaptive neural network utilizing nanotechnology-based components |
US20090287624A1 (en) | 2005-12-23 | 2009-11-19 | Societe De Commercialisation De Produits De La Recherche Applique-Socpra-Sciences Et Genie S.E.C. | Spatio-temporal pattern recognition using a spiking neural network and processing thereof on a portable and/or distributed computer |
US7672920B2 (en) | 2006-01-31 | 2010-03-02 | Sony Corporation | Apparatus and method for embedding recurrent neural networks into the nodes of a self-organizing map |
US20100086171A1 (en) | 2008-10-02 | 2010-04-08 | Silverbrook Research Pty Ltd | Method of imaging coding pattern having merged data symbols |
US20100166320A1 (en) | 2008-12-26 | 2010-07-01 | Paquier Williams J F | Multi-stage image pattern recognizer |
US20100198765A1 (en) | 2007-11-20 | 2010-08-05 | Christopher Fiorillo | Prediction by single neurons |
US7849030B2 (en) | 2006-05-31 | 2010-12-07 | Hartford Fire Insurance Company | Method and system for classifying documents |
RU2406105C2 (en) | 2006-06-13 | 2010-12-10 | Филипп Геннадьевич Нестерук | Method of processing information in neural networks |
US20110016071A1 (en) | 2009-07-20 | 2011-01-20 | Guillen Marcos E | Method for efficiently simulating the information processing in cells and tissues of the nervous system with a temporal series compressed encoding neural network |
US20110119215A1 (en) | 2009-11-13 | 2011-05-19 | International Business Machines Corporation | Hardware analog-digital neural networks |
US20110119214A1 (en) | 2009-11-18 | 2011-05-19 | International Business Machines Corporation | Area efficient neuromorphic circuits |
US20110160741A1 (en) | 2008-06-09 | 2011-06-30 | Hiroyuki Asano | Medical treatment tool for tubular organ |
CN102226740A (en) | 2011-04-18 | 2011-10-26 | 中国计量学院 | Bearing fault detection method based on manner of controlling stochastic resonance by external periodic signal |
US20120011093A1 (en) | 2010-07-07 | 2012-01-12 | Qualcomm Incorporated | Methods and systems for digital neural processing with discrete-level synapes and probabilistic stdp |
US20120011090A1 (en) | 2010-07-07 | 2012-01-12 | Qualcomm Incorporated | Methods and systems for three-memristor synapse with stdp and dopamine signaling |
US20120036099A1 (en) | 2010-08-04 | 2012-02-09 | Qualcomm Incorporated | Methods and systems for reward-modulated spike-timing-dependent-plasticity |
US20120109866A1 (en) | 2010-10-29 | 2012-05-03 | International Business Machines Corporation | Compact cognitive synaptic computing circuits |
US8315305B2 (en) | 2010-03-26 | 2012-11-20 | Brain Corporation | Systems and methods for invariant pulse latency coding |
US20120303091A1 (en) | 2010-03-26 | 2012-11-29 | Izhikevich Eugene M | Apparatus and methods for polychronous encoding and multiplexing in neuronal prosthetic devices |
US20120308136A1 (en) | 2010-03-26 | 2012-12-06 | Izhikevich Eugene M | Apparatus and methods for pulse-code invariant object recognition |
US20120308076A1 (en) | 2010-03-26 | 2012-12-06 | Filip Lukasz Piekniewski | Apparatus and methods for temporally proximate object recognition |
US20130073496A1 (en) | 2011-09-21 | 2013-03-21 | Botond Szatmary | Tag-based apparatus and methods for neural networks |
US20130073491A1 (en) | 2011-09-21 | 2013-03-21 | Eugene M. Izhikevich | Apparatus and methods for synaptic update in a pulse-coded network |
US20130073080A1 (en) | 2011-09-21 | 2013-03-21 | Filip Ponulak | Adaptive critic apparatus and methods |
US20130073500A1 (en) | 2011-09-21 | 2013-03-21 | Botond Szatmary | High level neuromorphic network description apparatus and methods |
US20130073493A1 (en) | 2011-09-16 | 2013-03-21 | International Business Machines Corporation | Unsupervised, supervised, and reinforced learning via spiking computation |
US20130151449A1 (en) | 2011-12-07 | 2013-06-13 | Filip Ponulak | Apparatus and methods for implementing learning for analog and spiking signals in artificial neural networks |
US20130204820A1 (en) | 2012-02-08 | 2013-08-08 | Qualcomm Incorporated | Methods and apparatus for spiking neural computation |
US20130218821A1 (en) | 2011-09-21 | 2013-08-22 | Botond Szatmary | Round-trip engineering apparatus and methods for neural networks |
US20130297541A1 (en) | 2012-05-07 | 2013-11-07 | Filip Piekniewski | Spiking neural network feedback apparatus and methods |
US20130325774A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Learning stochastic apparatus and methods |
US20130325776A1 (en) | 2011-09-21 | 2013-12-05 | Filip Ponulak | Apparatus and methods for reinforcement learning in artificial neural networks |
US20130325777A1 (en) | 2012-06-04 | 2013-12-05 | Csaba Petre | Spiking neuron network apparatus and methods |
US20130325768A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Stochastic spiking network learning apparatus and methods |
US20130325775A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Dynamically reconfigurable stochastic learning apparatus and methods |
US20130325766A1 (en) | 2012-06-04 | 2013-12-05 | Csaba Petre | Spiking neuron network apparatus and methods |
US20130325773A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Stochastic apparatus and methods for implementing generalized learning rules |
US20140016858A1 (en) | 2012-07-12 | 2014-01-16 | Micah Richert | Spiking neuron network sensory processing apparatus and methods |
US20140025613A1 (en) | 2012-07-20 | 2014-01-23 | Filip Ponulak | Apparatus and methods for reinforcement learning in large populations of artificial spiking neurons |
US20140032458A1 (en) | 2012-07-27 | 2014-01-30 | Oleg Sinyavskiy | Apparatus and methods for efficient updates in spiking neuron network |
US8655815B2 (en) | 2010-05-19 | 2014-02-18 | The Regents Of The University Of California | Neural processing unit |
US20140193066A1 (en) | 2012-12-10 | 2014-07-10 | Brain Corporation | Contrast enhancement spiking neuron network sensory processing apparatus and methods |
-
2012
- 2012-06-04 US US13/487,533 patent/US9146546B2/en not_active Expired - Fee Related
Patent Citations (87)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5092343A (en) | 1988-02-17 | 1992-03-03 | Wayne State University | Waveform analysis apparatus and method using neural network techniques |
US5063603A (en) | 1989-11-06 | 1991-11-05 | David Sarnoff Research Center, Inc. | Dynamic method for recognizing objects and image processing system therefor |
US5408588A (en) | 1991-06-06 | 1995-04-18 | Ulug; Mehmet E. | Artificial neural network method and architecture |
US5467428A (en) | 1991-06-06 | 1995-11-14 | Ulug; Mehmet E. | Artificial neural network method and architecture adaptive signal filtering |
US5875108A (en) | 1991-12-23 | 1999-02-23 | Hoffberg; Steven M. | Ergonomic man-machine interface incorporating adaptive pattern recognition based control system |
US5245672A (en) | 1992-03-09 | 1993-09-14 | The United States Of America As Represented By The Secretary Of Commerce | Object/anti-object neural network segmentation |
US5355435A (en) | 1992-05-18 | 1994-10-11 | New Mexico State University Technology Transfer Corp. | Asynchronous temporal neural processing element |
US5673367A (en) | 1992-10-01 | 1997-09-30 | Buckley; Theresa M. | Method for neural network control of motion using real-time environmental feedback |
US5638359A (en) | 1992-12-14 | 1997-06-10 | Nokia Telecommunications Oy | Method for congestion management in a frame relay network and a node in a frame relay network |
US5388186A (en) | 1993-02-01 | 1995-02-07 | At&T Corp. | Differential process controller using artificial neural networks |
RU2108612C1 (en) | 1994-09-14 | 1998-04-10 | Круглов Сергей Петрович | Adaptive control system with identifier and implicit reference model |
US6014653A (en) | 1996-01-26 | 2000-01-11 | Thaler; Stephen L. | Non-algorithmically implemented artificial neural networks and components thereof |
US6009418A (en) | 1996-05-02 | 1999-12-28 | Cooper; David L. | Method and apparatus for neural networking using semantic attractor architecture |
US6601049B1 (en) | 1996-05-02 | 2003-07-29 | David L. Cooper | Self-adjusting multi-layer neural network architectures and methods therefor |
US6363369B1 (en) | 1997-06-11 | 2002-03-26 | University Of Southern California | Dynamic synapse for signal processing in neural networks |
US20030050903A1 (en) | 1997-06-11 | 2003-03-13 | Jim-Shih Liaw | Dynamic synapse for signal processing in neural networks |
US6643627B2 (en) | 1997-06-11 | 2003-11-04 | University Of Southern California | Dynamic synapse for signal processing in neural networks |
US6545708B1 (en) | 1997-07-11 | 2003-04-08 | Sony Corporation | Camera controlling device and method for predicted viewing |
US6458157B1 (en) | 1997-08-04 | 2002-10-01 | Suaning Gregg Joergen | Retinal stimulator |
US6581046B1 (en) | 1997-10-10 | 2003-06-17 | Yeda Research And Development Co. Ltd. | Neuronal phase-locked loops |
US6545705B1 (en) | 1998-04-10 | 2003-04-08 | Lynx System Developers, Inc. | Camera with object recognition/data output |
EP1089436A2 (en) | 1999-09-29 | 2001-04-04 | Lucent Technologies Inc. | Current-mode spike-based analog-to-digital conversion |
US6546291B2 (en) | 2000-02-16 | 2003-04-08 | Massachusetts Eye & Ear Infirmary | Balance prosthesis |
US20020038294A1 (en) | 2000-06-16 | 2002-03-28 | Masakazu Matsugu | Apparatus and method for detecting or recognizing pattern by employing a plurality of feature detecting elements |
US20040193670A1 (en) | 2001-05-21 | 2004-09-30 | Langan John D. | Spatio-temporal filter and method |
US20050036649A1 (en) | 2001-08-23 | 2005-02-17 | Jun Yokono | Robot apparatus, face recognition method, and face recognition apparatus |
US20090043722A1 (en) | 2003-03-27 | 2009-02-12 | Alex Nugent | Adaptive neural network utilizing nanotechnology-based components |
US7426501B2 (en) | 2003-07-18 | 2008-09-16 | Knowntech, Llc | Nanotechnology neural network methods and systems |
US20050015351A1 (en) | 2003-07-18 | 2005-01-20 | Alex Nugent | Nanotechnology neural network methods and systems |
US20060161218A1 (en) | 2003-11-26 | 2006-07-20 | Wicab, Inc. | Systems and methods for treating traumatic brain injury |
US20070208678A1 (en) | 2004-03-17 | 2007-09-06 | Canon Kabushiki Kaisha | Parallel Pulse Signal Processing Apparatus, Pattern Recognition Apparatus, And Image Input Apparatus |
US20050283450A1 (en) | 2004-06-11 | 2005-12-22 | Masakazu Matsugu | Information processing apparatus, information processing method, pattern recognition apparatus, and pattern recognition method |
US8015130B2 (en) | 2004-06-11 | 2011-09-06 | Canon Kabushiki Kaisha | Information processing apparatus, information processing method, pattern recognition apparatus, and pattern recognition method |
US20070176643A1 (en) | 2005-06-17 | 2007-08-02 | Alex Nugent | Universal logic gate utilizing nanotechnology |
US20070022068A1 (en) | 2005-07-01 | 2007-01-25 | Ralph Linsker | Neural networks for prediction and control |
US7395251B2 (en) | 2005-07-01 | 2008-07-01 | International Business Machines Corporation | Neural networks for prediction and control |
US20090287624A1 (en) | 2005-12-23 | 2009-11-19 | Societe De Commercialisation De Produits De La Recherche Applique-Socpra-Sciences Et Genie S.E.C. | Spatio-temporal pattern recognition using a spiking neural network and processing thereof on a portable and/or distributed computer |
US7672920B2 (en) | 2006-01-31 | 2010-03-02 | Sony Corporation | Apparatus and method for embedding recurrent neural networks into the nodes of a self-organizing map |
US7849030B2 (en) | 2006-05-31 | 2010-12-07 | Hartford Fire Insurance Company | Method and system for classifying documents |
RU2406105C2 (en) | 2006-06-13 | 2010-12-10 | Филипп Геннадьевич Нестерук | Method of processing information in neural networks |
US20080024345A1 (en) | 2006-07-27 | 2008-01-31 | Brian Watson | Analog to digital conversion using recurrent neural networks |
JP4087423B2 (en) | 2006-10-17 | 2008-05-21 | 京セラミタ株式会社 | Portable communication device |
WO2008083335A2 (en) | 2006-12-29 | 2008-07-10 | Neurosciences Research Foundation, Inc. | Solving the distal reward problem through linkage of stdp and dopamine signaling |
US20080162391A1 (en) | 2006-12-29 | 2008-07-03 | Neurosciences Research Foundation, Inc. | Solving the distal reward problem through linkage of stdp and dopamine signaling |
US8103602B2 (en) | 2006-12-29 | 2012-01-24 | Neurosciences Research Foundation, Inc. | Solving the distal reward problem through linkage of STDP and dopamine signaling |
WO2008132066A1 (en) | 2007-04-27 | 2008-11-06 | Siemens Aktiengesellschaft | A method for computer-assisted learning of one or more neural networks |
US20100198765A1 (en) | 2007-11-20 | 2010-08-05 | Christopher Fiorillo | Prediction by single neurons |
US20110160741A1 (en) | 2008-06-09 | 2011-06-30 | Hiroyuki Asano | Medical treatment tool for tubular organ |
US20100086171A1 (en) | 2008-10-02 | 2010-04-08 | Silverbrook Research Pty Ltd | Method of imaging coding pattern having merged data symbols |
US20100166320A1 (en) | 2008-12-26 | 2010-07-01 | Paquier Williams J F | Multi-stage image pattern recognizer |
US20110016071A1 (en) | 2009-07-20 | 2011-01-20 | Guillen Marcos E | Method for efficiently simulating the information processing in cells and tissues of the nervous system with a temporal series compressed encoding neural network |
US20110119215A1 (en) | 2009-11-13 | 2011-05-19 | International Business Machines Corporation | Hardware analog-digital neural networks |
US20110119214A1 (en) | 2009-11-18 | 2011-05-19 | International Business Machines Corporation | Area efficient neuromorphic circuits |
US20120308136A1 (en) | 2010-03-26 | 2012-12-06 | Izhikevich Eugene M | Apparatus and methods for pulse-code invariant object recognition |
US8467623B2 (en) | 2010-03-26 | 2013-06-18 | Brain Corporation | Invariant pulse latency coding systems and methods systems and methods |
US20130251278A1 (en) | 2010-03-26 | 2013-09-26 | Eugene M. Izhikevich | Invariant pulse latency coding systems and methods |
US20120308076A1 (en) | 2010-03-26 | 2012-12-06 | Filip Lukasz Piekniewski | Apparatus and methods for temporally proximate object recognition |
US8315305B2 (en) | 2010-03-26 | 2012-11-20 | Brain Corporation | Systems and methods for invariant pulse latency coding |
US20120303091A1 (en) | 2010-03-26 | 2012-11-29 | Izhikevich Eugene M | Apparatus and methods for polychronous encoding and multiplexing in neuronal prosthetic devices |
US8655815B2 (en) | 2010-05-19 | 2014-02-18 | The Regents Of The University Of California | Neural processing unit |
US20120011093A1 (en) | 2010-07-07 | 2012-01-12 | Qualcomm Incorporated | Methods and systems for digital neural processing with discrete-level synapes and probabilistic stdp |
US20120011090A1 (en) | 2010-07-07 | 2012-01-12 | Qualcomm Incorporated | Methods and systems for three-memristor synapse with stdp and dopamine signaling |
US20120036099A1 (en) | 2010-08-04 | 2012-02-09 | Qualcomm Incorporated | Methods and systems for reward-modulated spike-timing-dependent-plasticity |
US20120109866A1 (en) | 2010-10-29 | 2012-05-03 | International Business Machines Corporation | Compact cognitive synaptic computing circuits |
CN102226740A (en) | 2011-04-18 | 2011-10-26 | 中国计量学院 | Bearing fault detection method based on manner of controlling stochastic resonance by external periodic signal |
US20130073493A1 (en) | 2011-09-16 | 2013-03-21 | International Business Machines Corporation | Unsupervised, supervised, and reinforced learning via spiking computation |
US20130325776A1 (en) | 2011-09-21 | 2013-12-05 | Filip Ponulak | Apparatus and methods for reinforcement learning in artificial neural networks |
US20130073500A1 (en) | 2011-09-21 | 2013-03-21 | Botond Szatmary | High level neuromorphic network description apparatus and methods |
US20130218821A1 (en) | 2011-09-21 | 2013-08-22 | Botond Szatmary | Round-trip engineering apparatus and methods for neural networks |
US20130073080A1 (en) | 2011-09-21 | 2013-03-21 | Filip Ponulak | Adaptive critic apparatus and methods |
US20130073496A1 (en) | 2011-09-21 | 2013-03-21 | Botond Szatmary | Tag-based apparatus and methods for neural networks |
US20130073491A1 (en) | 2011-09-21 | 2013-03-21 | Eugene M. Izhikevich | Apparatus and methods for synaptic update in a pulse-coded network |
US20130151449A1 (en) | 2011-12-07 | 2013-06-13 | Filip Ponulak | Apparatus and methods for implementing learning for analog and spiking signals in artificial neural networks |
US20130151448A1 (en) | 2011-12-07 | 2013-06-13 | Filip Ponulak | Apparatus and methods for implementing learning for analog and spiking signals in artificial neural networks |
US20130151450A1 (en) | 2011-12-07 | 2013-06-13 | Filip Ponulak | Neural network apparatus and methods for signal conversion |
US20130204820A1 (en) | 2012-02-08 | 2013-08-08 | Qualcomm Incorporated | Methods and apparatus for spiking neural computation |
US20130297541A1 (en) | 2012-05-07 | 2013-11-07 | Filip Piekniewski | Spiking neural network feedback apparatus and methods |
US20130325777A1 (en) | 2012-06-04 | 2013-12-05 | Csaba Petre | Spiking neuron network apparatus and methods |
US20130325768A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Stochastic spiking network learning apparatus and methods |
US20130325775A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Dynamically reconfigurable stochastic learning apparatus and methods |
US20130325766A1 (en) | 2012-06-04 | 2013-12-05 | Csaba Petre | Spiking neuron network apparatus and methods |
US20130325773A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Stochastic apparatus and methods for implementing generalized learning rules |
US20130325774A1 (en) | 2012-06-04 | 2013-12-05 | Brain Corporation | Learning stochastic apparatus and methods |
US20140016858A1 (en) | 2012-07-12 | 2014-01-16 | Micah Richert | Spiking neuron network sensory processing apparatus and methods |
US20140025613A1 (en) | 2012-07-20 | 2014-01-23 | Filip Ponulak | Apparatus and methods for reinforcement learning in large populations of artificial spiking neurons |
US20140032458A1 (en) | 2012-07-27 | 2014-01-30 | Oleg Sinyavskiy | Apparatus and methods for efficient updates in spiking neuron network |
US20140193066A1 (en) | 2012-12-10 | 2014-07-10 | Brain Corporation | Contrast enhancement spiking neuron network sensory processing apparatus and methods |
Non-Patent Citations (120)
Title |
---|
"In search of the artificial retina" [online ], Vision Systems Design, Apr. 1, 2007. |
Aleksandrov (1968), Stochastic optimization, Engineering Cybernetics, 5, 11-16. |
Amari (1998), Why natural gradient?, Acoustics, Speech and Signal Processing, (pp. 1213-1216). Seattle, WA, USA. |
Baras, D. et al. "Reinforcement learning, spike-time-dependent plasticity, and the BCM rule." Neural Computation vol. 19 No. 8 (2007): pp. 2245-2279. |
Bartlett et al., (2000) "A Biologically Plausible and Locally Optimal Learning Algorithm for Spiking Neurons" Retrieved from http://arp.anu.edu.au/ftp/papers/jon/brains.pdf.gz. |
Baxter et al. (2000.). Direct gradient-based reinforcement learning. In Proceedings of the International Symposium on Circuits. |
Bennett (1999), The early history of the synapse: from Plato to Sherrington. Brain Res. Bull., 50(2): 95-118. |
Bohte et al., "A Computational Theory of Spike-Timing Dependent Plasticity: Achieving Robust Neural Response via Conditional Entropy Minimization" 2004. |
Bohte, (2000). SpikeProp: backpropagation for networks of spiking neurons. In Proceedings of ESANN'2000, (pp. 419-424). |
Bohte, 'Spiking Nueral Networks' Doctorate at the University of Leiden, Holland, Mar. 5, 2003, pp. 1-133 [retrieved on Nov. 14, 2012], Retrieved from the internet: . |
Bohte, 'Spiking Nueral Networks' Doctorate at the University of Leiden, Holland, Mar. 5, 2003, pp. 1-133 [retrieved on Nov. 14, 2012], Retrieved from the internet: <URL: http://holnepagcs,cwi,n11-sbolltedmblica6ond)hdthesislxif>. |
Booij (2005, 6). A Gradient Descent Rule for Spiking Neurons Emitting Multiple Spikes. Information Processing Letters n. 6, v.95 , 552-558. |
Bouganis et al., (2010) "Training a Spiking Neural Network to Control a 4-DoF Robotic Arm based on Spike Timgin-Dependent Plasticity", Proceedings of WCCI201 0 IEEE World Congress on Computational Intelligence, CCIB, Barcelona, Spain, Jul. 18-23, 2010, pp. 4104-4111. |
Breiman et al., "Random Forests" 33pgs, Jan. 2001. |
Brette et al., Brian: a simple and flexible simulator for spiking neural networks, The Neuromorphic Engineer, Jul. 1, 2009, pp. 1-4, doi: 10.2417/1200906.1659. |
Capel, "Random Forests and Ferns" LPAC, Jan. 11, 2012, 40 pgs. |
Cuntz et al., 'One Rule to Grow Them All: A General Theory of Neuronal Branching and Its Paractical Application' PLOS Computational Biology, 6 (8), Published Aug. 5, 2010. |
Davison et al., PyNN: a common interface for neuronal network simulators, Frontiers in Neuroinformatics, Jan. 2009, pp. 1-10, vol. 2, Article 11. |
D'Cruz (1998) Reinforcement Learning in Intelligent Control: A Biologically-Inspired Approach to the Re-earning Problem Brendan May 1998. |
de Queiroz, M. et al. "Reinforcement learning of a simple control task using the spike response model." Neurocomputing vol. 70 No. 1 (2006): pp. 14-20. |
Djurfeldt, Mikael, The Connection-set Algebra: a formalism for the representation of connectivity structure in neuronal network models, implementations in Python and C++, and their use in simulators BMC Neuroscience Jul. 18, 2011 p. 1 12(Suppl 1):P80. |
El-Laithy (2011), A reinforcement learning framework for spiking networks with dynamic synapses, Comput Intell Neurosci. |
Fidjeland et al., Accelerated Simulation of Spiking Neural Networks Using GPUs [online],2010 [retrieved on Jun. 15, 2013], Retrieved from the Internet: URL:http://ieeexplore.ieee.org/xpls/abs-all.jsp?ammber=5596678&tag=1. |
Fletcher (1987), Practical methods of optimization, New York, NY: Wiley-Interscience. |
Floreano et al., "Neuroevolution: from architectures to learning" Evol. Intel. Jan. 2008 1:47-62, [retrieved Dec. 30, 2013] [retrieved online from URL:. |
Floreano et al., "Neuroevolution: from architectures to learning" Evol. Intel. Jan. 2008 1:47-62, [retrieved Dec. 30, 2013] [retrieved online from URL:<http://inforscience.epfl.ch/record/112676/files/FloreanoDuerrMattiussi2008.pdf>. |
Floreano et al., (2008) Floreano et al. Neuroevolution: From Architectures to learning Evol. Intel. Jan. 2008 1:47-62 (retrieved online on Apr. 24, 2013 from http://infoscience.epfl.ch/record/112676/files/FloreanoDuerrMattiussi2008pdf). |
Florian (2005), A reinforcement learning algorithm for spiking neural networks SYNASC '05 Proceedings of the Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. |
Florian (2007) Reinforcement Learning Through Modulation of Spike-Timing-Dependent Synaptic Plasticity, Neural Computation 19, 1468-1502 Massachusetts Institute of Technology. |
Fremaux et al., "Functional Requirements for Reward-Modulated Spike-Timing-Dependent Plasticity", The Journal of Neuroscience, Oct. 6, 2010, 30(40):13326-13337. |
Froemke et al., Temporal modulation of spike-timing-dependent plasticity, Frontiers in Synaptic Neuroscience, vol. 2, Article 19, pp. 1-16 [online] Jun. 2010 [retrieved on Dec. 16, 2013]. Retrieved from the internet: . |
Froemke et al., Temporal modulation of spike-timing-dependent plasticity, Frontiers in Synaptic Neuroscience, vol. 2, Article 19, pp. 1-16 [online] Jun. 2010 [retrieved on Dec. 16, 2013]. Retrieved from the internet: <frontiersin.org>. |
Fu (2005) Stochastic Gradient Estimation, Technical Research Report. |
Fu (2008), What You Should Know About Simulation and Derivatives Naval Research Logistics, vol. 55, No. 8, 723-736. |
Fyfe et al., (2007), Reinforcement Learning Reward Functions for Unsupervised Learning, ISNN '07 Proceedings of the 4th international symposium on Neural Networks: Advanced in Neural Networks. |
Gerstner (2002), Spiking neuron models: single neurons, populations, plasticity, Cambridge, U.K.: Cambridge University Press. |
Gewaltig et al., 'NEST (Neural Simulation Tool)', Scholarpedia, 2007, pp. 1-15, 2(4): 1430, doi: 1 0.4249/scholarpedia.1430. |
Gleeson et al., NeuroML: A Language for Describing Data Driven Models of Neurons and Networks with a High Degree of Biological Detail, PLoS Computational Biology, Jun. 2010, pp. 1-19 vol. 6 Issue 6. |
Glynn (1995), Likelihood ratio gradient estimation for regenerative stochastic recursions, Advances in Applied Probability, 27, 4, 1019-1053. |
Goodman et al., Brian: a simulator for spiking neural networks in Python, Frontiers in Neuroinformatics, Nov. 2008, pp. 1-10, vol. 2, Article 5. |
Gorchetchnikov et al., NineML: declarative, mathematically-explicit descriptions of spiking neuronal networks, Frontiers in Neuroinformatics, Conference Abstract: 4th INCF Congress of Neuroinformatics, doi: 1 0.3389/conf.fninf.2011.08.00098. |
Graham, Lyle J., The Surf-Hippo Reference Manual, http://www.neurophys.biomedicale.univparis5. fr/-graham/surf-hippo-files/Surf-Hippo%20Reference%20Manual.pdf, Mar. 2002, pp. 1-128. |
Hagras, Hani et al., "Evolving Spiking Neural Network Controllers for Autonomous Robots", IEEE 2004. |
Haykin, (1999), Neural Networks: A Comprehensive Foundation (Second Edition), Prentice-Hall. |
Ho, "Random Decision Forests" Int'l Conf. Document Analysis and Recognition, 1995, 5 pgs. |
Izhikevich (2007), Solving the distal reward problem through linkage of STDP and dopamine signaling, Cerebral Cortex, vol. 17, pp. 2443-2452. |
Izhikevich et al., 'Relating STDP to BCM', Neural Computation (2003) 15, 1511-1523. |
Izhikevich, 'Polychronization: Computation with Spikes', Neural Computation, 25, 2006, 18, 245-282. |
Izhikevich, 'Simple Model of Spiking Neurons', IEEE Transactions on Neural Networks, vol. 14, No. 6, Nov. 2003, pp. 1569-1572. |
Jesper Tegner, et al., 2002 "An adaptive spike-timing-dependent plasticity rule" Elsevier Science B.V. |
Kalal et al. "Online learning of robust object detectors during unstable tracking" published on 3rd On-line Learning for Computer Vision Workshop 2009, Kyoto, Japan, IEEE CS. |
Karbowski et al., 'Multispikes and Synchronization in a Large Neural Network with Temporal Delays', Neural Computation 12, 1573-1606 (2000). |
Kenji, (2000), Reinforcement Learning in Continuous Time and Space, Neural Computation, 12:1, 219-245. |
Khotanzad, "Classification of invariant image representations using a neural network" IEEF. Transactions on Acoustics, Speech, and Signal Processing, vol. 38, No. 6, Jun. 1990, pp. 1028-1038 [online], [retrieved on Dec. 10, 2013]. Retrieved from the Internet . |
Khotanzad, "Classification of invariant image representations using a neural network" IEEF. Transactions on Acoustics, Speech, and Signal Processing, vol. 38, No. 6, Jun. 1990, pp. 1028-1038 [online], [retrieved on Dec. 10, 2013]. Retrieved from the Internet <URL: http://www-ee.uta.edu/eeweb/IP/Courses/SPR/Reference/Khotanzad.pdf>. |
Kiefer (1952), Stochastic Estimation of the Maximum of a Regression Function, Annals of Mathematical Statistics 23, #3, 462-466. |
Klampfl (2009), Spiking neurons can learn to solve information bottleneck problems and extract independent components, Neural Computation, 21(4), pp. 911-959. |
Kleijnen et al., "Optimization and sensitivity analysis of computer simulation models by the score function method", Invited Review European Journal of Operational Research, Mar. 1995. |
Kleijnen et al., Optimization and sensitivity analysis of computer simulation models by the score function method Invited Review European Journal of Operational Research, Mar. 1995. |
Klute et al., (2002). Artificial Muscles: Actuators for Biorobotic Systems. The International Journal 0./ Robotics Research 21 :295-309. |
Larochelle et al., (2009), Exploring Strategies for Training Deep Neural Networks, J. of Machine Learning Research, v. 10, pp. 1-40. |
Laurent, 'Issue 1-nnql-Refactor Nucleus into its own file-Neural Network Query Language' [retrieved on Nov. 12, 2013]. Retrieved from the Internet: URL:https://code.google.com/p/nnql/issues/detail?id=. |
Laurent, 'The Neural Network Query Language (NNQL) Reference' [retrieved on Nov. 12, 2013]. Retrieved from the Internet: . |
Laurent, 'The Neural Network Query Language (NNQL) Reference' [retrieved on Nov. 12, 2013]. Retrieved from the Internet: <URL'http://code.google.com/p/nnql/issues/detail?id=1>. |
Legenstein et al., (2008), A learning theory for reward-modulated spike timing-dependent plasticity with application to biofeedback. PLoS Computational Biology, 4(10): 1-27. |
Lendek et al., (2006) State Estimation under Uncertainty: A Survey. Technical report 06-004, Delft Center for Systems and Control Delft University of Technology. |
Masakazu et al, "Convolutional Spiking Neural Network Model for Robust Face Detection", 2002 Proceedings of the 9th International Conference on Neural Information Processing (ICONIP'02), vol. 2. |
Masquelier et al., "Unsupervised learning of visual features through spike timing dependent plasticity", PLoS Computational Biology 3.2 (2007): e31, pp. 0247-0257. |
Morrison, (2008)Phenomenological models of synaptic plasticity based on spike timing, Received: Jan. 16, 2008 / Accepted: Apr. 9, 2008 The Author(s). |
Nicholas, A Re configurable Computing Architecture for Implementing Artificial Neural Networks on FPGA, Master's Thesis, The University of Guelph, 2003, pp. 1-235. |
Nikolic et al., (2011) High-sensitivity silicon retina for robotics and prosthetics. |
Ojala et al., "Performance Evaluation of Texture Measures with Classification Based on Kullback Discrimination of Distribution" 1994 IEEE, pp. 582-585. |
Ozuysal et al., "Fast Keypoint Recognition in Ten Lines of Code" CVPR 2007. |
Ozuysal et al., "Fast Keypoint Recognition Using Random Ferns" IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 32, No. 3, Mar. 2010, pp. 448-461. |
Paugam-Moisy et al., "Computing with spiking neuron networks" G. Rozenberg T. Back, J. Kok (Eds.), Handbook of Natural Computing, Springer-Verlag (2010) [retrieved Dec. 30, 2013], [retrieved online from link.springer.com]. |
Pavlidis et al. Spiking neural network training using evolutionary algorithms. In: Proceedings 2005 IEEE International Joint Conference on Neural Networkds, 2005. IJCNN'05, vol. 4, pp. 2190-2194 Publication Date Jul. 31, 2005 [online] [Retrieved on Dec. 10, 2013] Retrieved from the Internet <URL: http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.5.4346&rep=rep1&type=pdf. |
PCT International Search Report and Written Opinion for International Application No. PCT/US2013/044124 dated Sep. 12, 2013. |
PCT International Search Report for International Application PCT/US2013/060352 dated Jan. 16, 2014. |
PCT International Search Report for PCT/US2013/052136 dated Nov. 30, 2013. |
Pfister (2003), Optimal Hebbian Learning: A Probabilistic Point of View, In ICANN Proceedings. Springer, pp. 92-98. |
Pfister (2006), Optimal Spike-Timing Dependent Plasticity for Precise Action Potential Firing in Supervised Learning, Neural computation ISSN 0899-7667, 18(6). |
Ponulak (2006) Supervised Learning in Spiking Neural Networks with ReSuMe Method. Doctoral Dissertation Poznan, Poland. |
Ponulak et al., (2010) Supervised Learning in Spiking Neural Networks with ReSuMe: Sequence Learning, Classification and Spike-Shifting. Neural Comp., 22(2): 467-510. |
Ponulak, "Analysis of the Resume learning Process for Spiking Neural Networks," International Journal of Applied Mathematics & Computer Science: Jun. 2008, vol. 18, Issue 2, p. 117. |
Ponulak, (2005), ReSuMe-New supervised learning method for Spiking Neural Networks. Technical Report, Institute of Control and Information Engineering, Poznan University of Technology. |
Reimna et al. (1989). Sensitivity analysis for simulations via likelihood ratios. Oper Res 37, 830-844. |
Robbins (1951), A Stochastic Approximation Method, Annals of Mathematical Statistics 22, #3, 400-407. |
Rosenstein et al., (2002), Supervised learning combined with an actor-critic architecture, Technical Report 02-41, Department of Computer Science, University of Massachusetts, Amherst. |
Rumelhart (1986), Learning internal representations by error propagation, Parallel distributed processing, vol. 1 (pp. 318-362), Cambridge, MA: MIT Press. |
Rumelhart et al., (1986), Learning representations by back-propagating errors, Nature 323 (6088) , pp. 533-536. |
Schemmel et al., Implementing synaptic plasticity in a VLSI spiking neural network model in Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN'06), IEEE Press (2006) Jul. 16-21, 2006, pp. 1-6 [online], [retrieved on Dec. 10, 2013]. Retrieved from the Internet . |
Schemmel et al., Implementing synaptic plasticity in a VLSI spiking neural network model in Proceedings of the 2006 International Joint Conference on Neural Networks (IJCNN'06), IEEE Press (2006) Jul. 16-21, 2006, pp. 1-6 [online], [retrieved on Dec. 10, 2013]. Retrieved from the Internet <URL: http://www.kip.uni-heidelberg.de/veroeffentlichungen/download.cgi/4620/ps/1774.pdf>. |
Schrauwen et al., "Improving SpikeProp: Enhancements to an Error-Backpropagation Rule for Spiking Neural Networks", ProsRISC workshop, 2004, pp. 301-305. |
Schreiber et al., (2003), A new correlation-based measure of spike timing reliability. Neurocomputing, 52-54, 925-931. |
Seung, H. "Learning in spiking neural networks by reinforcement of stochastic synaptic transmission." Neuron vol. 40 No. 6 (2003): pp. 1063-1073. |
Sherrington , (1897); The Central Nervous System. A Textbook of Physiology, 7th ed., part III, Ed. By Foster M. Macmillian and Co. Ltd., London, p. 929. |
Simulink.RTM. model [online], [Retrieved on Dec. 10, 2013] Retrived from <URL: http://www.mathworks.com/ products/simulink/index.html>. |
Simulink.RTM. model [online], [Retrieved on Dec. 10, 2013] Retrived from <URL: http://www.mathworks.com/ products/simulink/index.html>. |
Sinyavskiy et al. "Reinforcement learning of a spiking neural network in the task of control of an agent in a virtual discrete environment" Rus. J. Nonlin. Dyn., 2011, vol. 7, No. 4 (Mobile Robots), pp. 859-875, chapters 1-8 (Russian Article with English Abstract). |
Sinyavskiy, et al. "Generalized Stochatic Spiking Neuron Model and Extended Spike Respons Model in Spatial-Temporal Impulse Pattern Detection Task", Optical Memory and Neural Networks (Information Optics), 2010, vol. 19, No. 4, pp. 300-309, 2010. |
Sjostrom et al., 'Spike-Timing Dependent Plasticity' Scholarpedia, 5(2):1362 (2010), pp. 1-18. |
Stein, (1967). Some models of neural variability. Biophys. J., 7: 37-68. |
Sutton et al., (1998), Reinforcement Learning, an Introduction. MIT Press. |
Sutton, (1988). Learning to predict by the methods of temporal differences. Machine Learning 3(1), 9-44. |
Szatmary et al., 'Spike-timing Theory of Working Memory' PLoS Computational Biology, vol. 6, Issue 8, Aug. 19, 2010 [retrieved on Dec. 30, 2013]. Retrieved from the Internet: . |
Szatmary et al., 'Spike-timing Theory of Working Memory' PLoS Computational Biology, vol. 6, Issue 8, Aug. 19, 2010 [retrieved on Dec. 30, 2013]. Retrieved from the Internet: <URL: http://www.ploscompbiol.org/article/info%3Adoi%2F10.1371 %2Fjournal.pcbi.10008 79#>. |
Tishby et al., (1999), The information bottleneck method, In Proceedings of the 37th Annual Allerton Conference on Communication, Control and Computing, B Hajek & RS Sreenivas, eds., pp. 368-377, University of Illinois. |
Toyoizumi (2007), Optimality Model of Unsupervised Spike-Timing Dependent Plasticity: Synaptic Memory and Weight Distribution, Neural Computation, 19 (3). |
Toyoizumi et al., (2005), Generalized Bienenstock-Cooper-Munro rule for spiking neurons that maximizes information transmission, Proc. Natl. Acad. Sci. USA, 102, (pp. 5239-5244). |
Vasilaki et al., "Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail" PLoS, vol. 5, Issue 12, Dec. 2009. |
Vasilaki, et al., "Learning flexible sensori-motor mappings in a complex network" Biol Cybern (2009) 100:147-158. |
Weaver (2001), The Optimal Reward Baseline for Gradient-Based Reinforcement Learning, UAI 01 Proceedings of the 17th Conference in Uncertainty in Artificial Intelligence (pp. 538-545). Morgan Kaufman Publishers. |
Weber et al., (2009), Goal-Directed Feature Learning, In: Proc, International Joint Conference on Neural Networks, 3319-3326. |
Weber, C. et al. 'Robot docking with neural vision and reinforcement.' Knowledge-Based Systems vol. 17 No. 2 (2004): pp. 165-172. |
Werbos, (1992), or Prokhorov D.V and Wunsch D.C. (1997) Adaptive Critic Designs, IEEE Trans Neural Networks, vol. 8, No. 5, pp. 997-1007. |
White et al., (Eds.) (1992) Handbook of Intelligent Control: Neural, Fuzzy and Adaptive Approaches. Van Nostrand Reinhold, New York. |
Widrow et al., (1960) Adaptive Switching Circuits. IRE WESCON Convention Record 4: 96-104. |
Williams (1992), Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning, Machine Learning 8, 229-256. |
Xie et al., (2004) "Learning in neural networks by reinforcement of irregular spiking", Physical Review E, vol. 69, letter 041909, pp. 1-10. |
Yi (2009), Stochastic search using the natural gradient, ICML '09 Proceedings of the 26th Annual International Conference on Machine Learning. New York, NY, USA. |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10839302B2 (en) | 2015-11-24 | 2020-11-17 | The Research Foundation For The State University Of New York | Approximate value iteration with complex returns by bounding |
US10974051B2 (en) | 2015-12-30 | 2021-04-13 | Boston Scientific Neuromodulation Corporation | Method and apparatus for optimizing spatio-temporal patterns of neurostimulation for varying conditions |
US10183167B2 (en) | 2015-12-30 | 2019-01-22 | Boston Scientific Neuromodulation Corporation | Method and apparatus for composing spatio-temporal patterns of neurostimulation for cumulative effects |
US10195439B2 (en) * | 2015-12-30 | 2019-02-05 | Boston Scientific Neuromodulation Corporation | Method and apparatus for composing spatio-temporal patterns of neurostimulation using a neuronal network model |
US20170189685A1 (en) * | 2015-12-30 | 2017-07-06 | Boston Scientific Neuromodulation Corporation | Method and apparatus for composing spatio-temporal patterns of neurostimulation using a neuronal network model |
US10252059B2 (en) | 2015-12-30 | 2019-04-09 | Boston Scientific Neuromodulation Corporation | Method and apparatus for guided optimization of spatio-temporal patterns of neurostimulation |
US10755167B2 (en) | 2016-06-22 | 2020-08-25 | International Business Machines Corporation | Neuromorphic architecture with multiple coupled neurons using internal state neuron information |
US20190236482A1 (en) * | 2016-07-18 | 2019-08-01 | Google Llc | Training machine learning models on multiple machine learning tasks |
US10839316B2 (en) | 2016-08-08 | 2020-11-17 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using inertial auto-encoders |
US11353833B2 (en) | 2016-08-08 | 2022-06-07 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
WO2018045021A1 (en) * | 2016-09-01 | 2018-03-08 | Goldman Sachs & Co. LLC | Systems and methods for learning and predicting time-series data using deep multiplicative networks |
US10650307B2 (en) | 2016-09-13 | 2020-05-12 | International Business Machines Corporation | Neuromorphic architecture for unsupervised pattern detection and feature learning |
US10657426B2 (en) * | 2018-01-25 | 2020-05-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US11151428B2 (en) | 2018-01-25 | 2021-10-19 | Samsung Electronics Co., Ltd. | Accelerating long short-term memory networks via selective pruning |
US11568236B2 (en) | 2018-01-25 | 2023-01-31 | The Research Foundation For The State University Of New York | Framework and methods of diverse exploration for fast and safe policy improvement |
CN108985447A (en) * | 2018-06-15 | 2018-12-11 | 华中科技大学 | A kind of hardware pulse nerve network system |
US11524401B1 (en) | 2019-03-28 | 2022-12-13 | Apple Inc. | Learning skills from video demonstrations |
RU2784191C1 (en) * | 2021-12-27 | 2022-11-23 | Андрей Павлович Катанский | Method and apparatus for adaptive automated control of a heating, ventilation and air conditioning system |
Also Published As
Publication number | Publication date |
---|---|
US20130325768A1 (en) | 2013-12-05 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US9146546B2 (en) | Systems and apparatus for implementing task-specific learning using spiking neurons | |
US9104186B2 (en) | Stochastic apparatus and methods for implementing generalized learning rules | |
US9015092B2 (en) | Dynamically reconfigurable stochastic learning apparatus and methods | |
US20130325774A1 (en) | Learning stochastic apparatus and methods | |
US9367798B2 (en) | Spiking neuron network adaptive control apparatus and methods | |
US8990133B1 (en) | Apparatus and methods for state-dependent learning in spiking neuron networks | |
US9189730B1 (en) | Modulated stochasticity spiking neuron network controller apparatus and methods | |
US9082079B1 (en) | Proportional-integral-derivative controller effecting expansion kernels comprising a plurality of spiking neurons associated with a plurality of receptive fields | |
US9256215B2 (en) | Apparatus and methods for generalized state-dependent learning in spiking neuron networks | |
US9630318B2 (en) | Feature detection apparatus and methods for training of robotic navigation | |
US9256823B2 (en) | Apparatus and methods for efficient updates in spiking neuron network | |
US9460385B2 (en) | Apparatus and methods for rate-modulated plasticity in a neuron network | |
US9098811B2 (en) | Spiking neuron network apparatus and methods | |
US9213937B2 (en) | Apparatus and methods for gating analog and spiking signals in artificial neural networks | |
Yeung et al. | Sensitivity analysis for neural networks | |
US9183493B2 (en) | Adaptive plasticity apparatus and methods for spiking neuron network | |
US8332336B2 (en) | Method for selecting information | |
US20140025613A1 (en) | Apparatus and methods for reinforcement learning in large populations of artificial spiking neurons | |
US20130325766A1 (en) | Spiking neuron network apparatus and methods | |
US20140122398A1 (en) | Modulated plasticity apparatus and methods for spiking neuron network | |
US20150074026A1 (en) | Apparatus and methods for event-based plasticity in spiking neuron networks | |
US9552546B1 (en) | Apparatus and methods for efficacy balancing in a spiking neuron network | |
Chen | Neural networks in pattern recognition and their applications | |
Makwana et al. | FPGA Implementation of Artificial Neural Network | |
Боброва et al. | Introduction to neural networks and deep learning |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: BRAIN CORPORATION, CALIFORNIA Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:SINYAVSKIY, OLEG;COENEN, OLIVIER;REEL/FRAME:028311/0032 Effective date: 20120601 |
|
STCF | Information on status: patent grant |
Free format text: PATENTED CASE |
|
FEPP | Fee payment procedure |
Free format text: MAINTENANCE FEE REMINDER MAILED (ORIGINAL EVENT CODE: REM.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
LAPS | Lapse for failure to pay maintenance fees |
Free format text: PATENT EXPIRED FOR FAILURE TO PAY MAINTENANCE FEES (ORIGINAL EVENT CODE: EXP.); ENTITY STATUS OF PATENT OWNER: LARGE ENTITY |
|
STCH | Information on status: patent discontinuation |
Free format text: PATENT EXPIRED DUE TO NONPAYMENT OF MAINTENANCE FEES UNDER 37 CFR 1.362 |
|
FP | Expired due to failure to pay maintenance fee |
Effective date: 20190929 |