WO2014025765A2

WO2014025765A2 - Systems and methods for adaptive neural decoding

Info

Publication number: WO2014025765A2
Application number: PCT/US2013/053772
Authority: WO
Inventors: Babak Mahmoudi; Justin C. Sanchez
Original assignee: University Of Miami
Priority date: 2012-08-06
Filing date: 2013-08-06
Publication date: 2014-02-13
Also published as: WO2014025772A2; WO2014025772A3; WO2014025765A3

Abstract

This invention is directed to a system and method for adaptive neural decoding of a user. A plurality of interconnected processing units comprising sensory nodes, hidden nodes, and output nodes forms a three layer network. The sensory nodes receive signals from a neural vector and outputs signals to the hidden nodes. The hidden nodes are connected to the sensory nodes, and the output nodes to the hidden nodes through each of a synaptic connection having an individual synaptic weight. The hidden and output nodes are individually configured to calculate a probability based on its synaptic weights, and output a signal based on the probability. The output signal of the output node having the highest probability is transmitted to the environment. A feedback signal is received by a feedback module based on the output signal, and associated synaptic weight (s) are altered based on the feedback signal.

Description

SYSTEMS AND METHODS FOR ADAPTIVE NEURAL DECODING

Statement Regarding Federally-Sponsored Research and Development This invention was made with U.S. government support under grant number N66001-10-C-2008 and W31P4Q-12-C-0200 awarded by the Defense Advanced Research Projects Agency (DARPA) . The U.S. government may have certain rights in the invention. BACKGROUND OF THE INVENTION

Field of the Invention

The present invention relates to adaptive neural decoding and interfaces between the brain and machines. More specifically, the invention concerns automatically adjusting neural mapping for adaptive learning of neural decoders and neuroprosthetics . Additionally, the present invention is also directed to responsive neurorehabilitation through the use of adaptive neural decoders. Description of the Related Art

Neural decoding involves the reconstruction of stimuli from information that is represented in the brain by networks of neurons, in order to convert the impulses of the brain into actions that can be performed in the subject's environment. With neural decoding, thought can drive mechanical motion of objects in the environment, such as neuroprosthetic devices. A neural decoder is a device made up of a plurality of synthetic neurons or nodes that process impulses and convey information to other nodes further downstream by "firing" an action potential. Ideally, the pattern of synthetic neurons or nodes models the decision-making process in the brain, in order to convert neural impulses to commands for action to be taken in an environment.

Neural decoding can be particularly useful in medical cases in which a person's control of a limb or extremity has been reduced or is gone entirely. For instance, neural decoding can assist in rehabilitation of arm, hand, leg, or foot movement, such as following injury or impairment due to a medical condition such as Parkinsons, full or partial paralysis, and muscle wasting diseases or conditions. It can also be used for neuroprosthetic control, such as to replace an amputated arm, leg, hand, or foot.

However, many challenges still remain. For example, central nervous system (CNS) injuries, such as spinal cord injuries, impair the ability of the brain to communicate with and direct the body. Physical therapy is often used, but it does not fully engage the CNS, only affecting the brain indirectly and not quantitatively. Therefore, the connection between the brain and body is not fully restored, which is needed for successful rehabilitation .

In addition, the way the brain produces intent adapts and changes over time, a phenomenon called neural plasticity. It is a complicated and constantly ongoing process, and is influenced by numerous factors. Being able to account for and adapt to this ever-changing neural landscape is critical for successful neural decoding, but as yet remains only rudimentarily achieved. For instance, machines can "learn" to adapt to changes or perturbations, but the various approaches to machine learning all fall short of ideal .

There are currently three main machine learning paradigms : supervised, unsupervised, and reinforcement learning methods. In supervised learning approaches, the controller or converter of neural impulses to the machine needs a detailed copy of the desired response to compute a low-level feedback for adaptation. For example, in the case of reaching tasks, the desired response would be the trajectory of arm movements or the location of the target in the environment. There have been attempts to "infer" a desired response so that supervised learning approaches can be used, however these approaches do not exactly match the user's intent and may not be feasible in unstructured environments such as those encountered during daily living. Moreover, these static neural decoding algorithms assume stationary input/output relationships, limiting their usefulness since they cannot easily adapt to perturbations in the input space. Unsupervised learning is an alternative that uses a data driven approach which is suitable for neural decoding without any need for an external teaching signal. However for effective adaptation of the decoder, a model of the user's motor behavior is required. The decoder adaptation in this paradigm is then based on tracking the changes in the tuning parameters which may not be feasible in the event of major input non-stationarities . Also, adaptation in unsupervised paradigm is often slow and less accurate .

As a compromise between supervised and unsupervised approaches, reinforcement learning (RL) offers a semi-supervised, interactive learning control framework that neither requires a detailed desired signal (unlike supervised learning) nor a model of user's motor behavior for adaptation (unlike unsupervised learning) . The RL controller actively modifies its behavior to optimize a measure of performance or reward through interaction with the environment. These advantages come with a price, which is the slow convergence rate of model free reinforcement learning controllers. Associative reinforcement learning tasks define a class of RL problems where the controller receives immediate reinforcement based on the most recent state-action pair. In these tasks, elements of gradient based search in supervised learning and reinforcement-based optimization can be combined to design efficient controllers that can learn to climb the gradient of the reward which is also known as policy gradient methods in reinforcement learning.

Moreover, many known neural decoders are highly sensitive to the initial conditions, so the choice of initializing information is important. A person directing the use of the model must therefore know the desired outcome so as to be able to establish the model as close to the desired outcome as possible.

Changes to known neural decoder networks occur on a global level, affecting all the synthetic neurons or nodes the same way.

Accordingly, the nodes are not specialized. This creates a situation in which non-winning actions (i.e. actions that are not performed as a result of neural decoding in favor of another winning action which is performed) become less competitive over time, so the same action or outcome is constantly reinforced. This leads to an inhomogeneity of the parameters over time in which the model is unable to switch between actions. In effect, the model becomes very good at performing only one action, such as moving an arm in a particular direction, but it cannot easily switch to performing another action such as gripping. It similarly also results in a more unstable model over time.

It would therefore be beneficial to have a way to control neuroprosthetic devices in a way that fully engages and restores the connection between the brain and the body, even in severe cases such as CNS or spinal cord injury. It would also be beneficial to have a system that adapts quickly, responds to and corrects for neural plasticity, and does so in a self-adjusting manner without the need for external intervention so as to be useful as a practical tool in daily life.

In addition, brain-machine interfaces (BMI) utilize neural decoding to give users direct control over a robotic, communication, or functional electrical stimulation systems using only their neural activity. These achievements have been supported by numerous studies of neural decoding using single neuron activity, local field potentials (LFPs), and electrocorticograms (ECoG) in animal and human models to construct functional mappings between neural activity, kinematics, kinetics, and muscle activation. Such research has revealed multiple factors that can influence neural decoding and thus BMI performance on even short timescales, such as hours to days. For example, performance can be enhanced or degraded by the quantity, type and stability of the neural signals acquired, the effects of learning and plasticity, availability of physical signals for training the neural decoders, and duration of decoder use. These conditions create a dynamic substrate from which BMI designers and users need to produce stable and robust BMI performance if the systems are to be used for activities of daily living and increase independence for the BMI users.

Of the factors affecting BMI performance, two significant challenges include (1) how to create accurate and robust mapping of neural commands to BMI motor control states when a user is unable to produce a measureable kinematic signal for training the decoder, and (2) how to ensure that the motor decoder mapping remains effective over both long and short timescales when neural perturbations occur, which will inevitably happen. For BMIs that use chronically implanted microelectrode arrays in the brain, these perturbations include the loss or addition of neurons to the electrode recordings, failure of the electrodes themselves, and changes in neuron behavior that affect the statistics of the BMI input firing patterns over time .

In situations in which there is no explicit user-generated output available to directly create a neural decoder, studies have utilized carefully structured training paradigms, such as those described above, that use desired target information and/or imagined movements to calibrate the BMI controller. Other methods involve initializing the decoder with values based on baseline neural activity, ipsilateral arm movements, or using randomized past decoder parameters, and then refining the decoder. These methods all involve using supervised learning methods to adapt the decoder to the user's neural activity until effective BMI control has been achieved.

Adaptation of the neural decoder after it has been initially calibrated to the user has been shown to provide long duration BMI control that can accommodate gradual changes to the BMI input behavior. In several studies that used linear discriminant analysis of electroencephalogram (EEG) data, unsupervised adaptive methods were used to update the parts of the decoder that did not depend on labeled training data. However, in most cases adaptive BMI systems have relied on supervised adaptation. During supervised adaptation, the training data that is used to calculate the decoder is periodically updated using either additional kinematic data, recent outputs of the decoder itself (the current decoder being assumed effective enough to adequately infer the user's desired BMI output), or inferred kinematics based on known target information as new trials occur. However, these BMI systems rely on known (or inferred) targets or output kinematics as a desired response for training or updating the decoder, and therefore are not amenable to self-use.

As BMI systems begin transitioning from laboratory settings into activities of daily living, an important goal is to develop neural decoding algorithms that can be calibrated with a minimal burden on the user, provide stable control for long periods of time, and can be responsive to fluctuations in the decoder's neural input space (e.g. neurons appearing or being lost amongst electrode recordings).

Summary of the Invention

The present invention solves many of the problems of known neural decoders and brain-machine interfaces (BMI) or brain- computer interfaces (BCI) through adaptive neural decoding or mapping. Specifically, the present invention provides for systems and methods for neural decoding that is adaptive and self- adjusting based on its ability to use evaluative feedback on the outcomes/actions performed. The system and method are configured to utilize feedback from the brain itself, the environment, or combinations thereof. The present decoder utilizes Hebbian principles and then weights and incorporates both positive and negative feedback, allowing the decoder to adapt or "learn" quickly. This invention is therefore also self-adjusting as it does not require any manual recalibration or manipulation in order to adapt, so it can more easily be used. The adaptation or "learning" achieved by the present system occurs locally at each neuron or node, rather than globally across all nodes collectively, thereby providing a more specialized and stable system. As a result, the present invention also can maintain its performance over long periods of time, and can successfully adapt to dramatic neural reorganizations, allowing the system to quickly transition between intended actions relative to other neural decoders .

Specifically, the present invention includes a system and method of adaptive neural decoding of a user. The system generally comprises a neural vector, a feedback module, and an adaptive neural decoder. The adaptive neural decoder of the present invention generally comprises a network in which firing of the artificial neurons in the decoder is determined based on computational neuroscience, rather than electrical engineering principles as with other decoders. The neurons fire together, thereby increasing the efficiency of the decoder.

In a preferred embodiment, the network is formed of a plurality of interconnected processing units which may comprise sensory nodes, hidden nodes, and output nodes. These nodes may be structurally the same and/or configured the same in some embodiments. In other embodiments, the nodes may be structurally different and/or configured differently. The network may comprise three layers, i.e. an input layer of sensory nodes, a hidden layer of hidden nodes, and an output layer of output nodes. The network may alternatively comprise more than three layers, i.e. four layers with a first hidden layer and a second hidden layer. The network may further be feedforward or feedback in design. The network may further be fully connected or partially connected. The illustrative network in a preferred embodiment of the present system and method is a three layer feedforward fully connected network .

The neural vector is defined by a plurality of brain signals or other neural data of or related to a user's brain. The neural vector may further comprise appropriate parameters for modulation of various brain signals. In at least one embodiment, the neural vector may be mapped to certain signals . The neural vector may further comprise electroencephalography (EEG) , magnetoencephalography (MEG) , or other brain imaging methods or devices.

The input layer comprises a plurality of sensory nodes in at least one embodiment. Each sensory node is configured to receive an input signal from the neural vector or otherwise from a user's brain, and output a plurality of sensory output signals. The input signal into each sensory node may comprise a vector of firing rates or continuously valued potentials. In at least one embodiment, the incoming valued potentials are normalized from -1 to 1. The nominalization process may occur in the neural vector, in the sensory node, or a combination thereof. The connection between the sensory nodes and the neural vector or the user's brain may be through a wired or wireless connection.

The hidden layer comprises a plurality of hidden nodes in at least one embodiment. Each hidden node is configured to individually receive a sensory output signal from each of the plurality of sensory nodes through each of a plurality of synaptic connections. In a partially connected embodiment, at least one hidden node is configured to receive at least one sensory output signal from at least one sensory node through at least one synaptic connection. Each different synaptic connection is associated with an individual synaptic weight, and each hidden node is configured to calculate a probability based at least in part on at least one synaptic weight. Based on the calculated probability, each hidden node outputs a corresponding plurality of hidden output signals.

The output layer comprises a plurality of output nodes in at least one embodiment. Each output node is similarly configured to individually receive a hidden output signal from each of the plurality of hidden nodes through each of a plurality of synaptic connections. In a partially connected embodiment, at least one output node is configured to receive at least one hidden output signal from at least one hidden node through at least one synaptic connection. Each different synaptic connection is similarly associated with an individual synaptic weight, and each node is configured to calculate a probability based at least in part on at least one synaptic weight. The output node having the highest probability is selected as the winning node. The output of the winning node is transmitted by the system to the environment. The environment may comprise additional systems, devices, or other interfaces configured to receive the output signal from the system.

The feedback module is configured to receive a feedback signal from the environment. This feedback signal may comprise positive or negative feedback, where positive feedback is a successful outcome compared to the user's intended action, and negative feedback is an unsuccessful outcome compared to the user's intended action. In a preferred embodiment, the feedback signal comprises user-driven feedback. This feedback may come directly from a user's brain as signals corresponding to positive ones or negative ones, come from the user as a motor function such as the movement of a finger, breathing, blinking, etc., or a combination thereof. This allows the user to continuously calibrate the neural decoder and system without the need for any external maintenance.

The feedback signal is used to readjust at least one synaptic weight of at least one processing unit, i.e. synaptic weights of synaptic connections of hidden node(s) and output node(s). The weight change may be performed by the feedback module, or by the individual processing units or nodes. In a preferred embodiment, the negative feedback results in a greater weight change relative to positive feedback. Further, in at least one embodiment, any positive feedback may result in diminishing returns over time, or otherwise converge to smaller increments of change over repeated intervals of positive feedback.

The neural decoder, systems and methods of neural decoding of a user may be directed to applications of the neural decoder described herein, such as brain-machine interfaces (BMI) utilizing the present neural decoder. These BMI may be used to control movement and action of devices, such as neuroprosthetics, which may be internally or externally located relative to a subject. Communication between the neural decoder, BMI, and device may be wireless or hard-wired, depending on the particular application. The BMI applications of the neural decoder of the present invention can be used in clinical settings such as for rehabilitation from an injury or condition, or retraining the brain .

The Examples provided herein demonstrate a semisupervised learning method designed to allow systems to obtain reward by learning to interact with the environment, and which has adaptation built into the system and method itself using a feedback signal. As with supervised adaptation methods, these decoders can adapt their parameters to respond to user performance. Unlike supervised adaptation methods, however, they use a decoding framework that does not rely on known or inferred targets or output kinematics as a desired response for training or updating the decoder. Therefore they can be used even when such information is unavailable (as would be the case in highly unstructured BMI environments), or when the output of the current BMI decoder is random (e.g. an uncalibrated BMI system, or when a large change has occurred within the decoder input space), because they use a scalar qualitative feedback as a reinforcement signal to adapt the decoder.

The adaptive decoder of the present invention offer several significant advantages as BMI controllers over current supervised learning decoders. First, they do not require an explicit set of kinematic training data to be initialized, instead beginning control with random parameters and gradually being computationally optimized through experience based on feedback of current performance. This adaption only requires a simple binary feedback, even for tasks that involve more than two action decisions, which opens numerous opportunities for deployment with paralyzed BMI users. Second, the adaptive weighing as used in the present invention do not need to assume stationarity between neural inputs and behavioral outputs, making them less sensitive to failures of recording electrodes, neurons changing their firing patterns due to learning or plasticity, and neurons appearing or disappearing from recordings. These attributes are important considerations if BMIs are to be used by humans over long periods for activities of daily living.

The neural decoder of the present invention may also be directed to systems and methods of responsive neurorehabilitation (RNR) related to central nervous system (CNS) injury. In CNS injury, physical therapy and occupational therapy are "expected" to restore certain motor control through a change in brain function. However, current approaches to affecting brain function are indirect, not quantitative, and do not fully engage the CNS. During rehabilitation, there are usually no direct interfaces to the brain, and feedback is minimal. The present invention uses responsive neurorehabilitation in order to induce plastic change sin CNS neuronal representation. As such, the RNR system uses rehabilitation training with feedback to induce changes in the size and location of cortical maps after a CNS injury. The RNR system is responsive and adapt with the users in order to enhance performance of therapy used selectively strengthen certain connections between the brain and the body.

Accordingly, the RNR system generally comprises an adaptive rehabilitation controller, at least one rehabilitation device, and a patient or user. The adaptive rehabilitation controller further comprises an adaptive feature extractor and an adaptive motor decoder .

The user of the RNR system and method may comprise any life form capable of emitting a brain or neural signal. The neural signal may be captured or interpreted in an EEG, MEG, or any method and/or device capable of neuroimaging or the recording of brain activity.

The adaptive rehabilitation controller is configured to receive a neural signal from the user. The neural signal may be received by the adaptive feature extractor. The adaptive feature extractor comprises at least one feature, or representation of the neural signal as defined by characteristics appropriate for neural imaging. Features may be predetermined, custom input, or be created based on a programmed routine. At least one feature may be associated or triggered by the incoming neural signal, and these feature (s) are then transmitted to the adaptive motor decoder. Additional features may be dynamically generated and transmitted by the adaptive feature extractor based on a programmed routine.

The adaptive motor decoder is configured to receive at least one feature from the adaptive feature extractor. The adaptive motor decoder learns to map the features received to the user's intent, based on the feedback received from the user. In at least one embodiment, the adaptive motor decoder comprises an adaptive neural decoder as described above, namely a three layer feedforward fully connected neural network. In other embodiments, the adaptive motor decoder may comprise other structures or configurations of different neural decoders. In a preferred embodiment, the adaptive motor decoder weighs negative feedback more than positive feedback. Further, the positive feedback may converge after repeated iterations of positive feedback, and this convergence may trigger additional features to be transmitted from the adaptive feature extractor to then be mapped.

The rehabilitation device is configured to receive the control signal. The rehabilitation device may comprise a functional electrical stimulator (FES), instrumented object, or any other devices appropriate for neurorehabilitation . Based on the action performed by the device, the user observes and receives sensory feedback, which is then transmitted as a neural signal back into the adaptive rehabilitation controller, to be received by the adaptive feature extractor. Based on whether the feedback is positive or negative, feature (s) are transmitted to the adaptive motor decoder, which then adjusts its internal parameters respectively. Through this iterative feedback process, the adaptive rehabilitation controller is able to learn and map the user's brain signals, i.e. features, to corresponding actions of the rehabilitation device. This loop of brain, adaptive rehabilitation controller, rehabilitation device, and the body increases the efficiency and effectiveness of the overall rehabilitation process.

These and other objects, features and advantages of the present invention will become clearer when the drawings as well as the detailed description are taken into consideration.

Brief Description of the Drawings

For a fuller understanding of the nature of the present invention, reference should be had to the following detailed description taken in connection with the accompanying drawings in which : Figure 1 is a schematic representation of the (A) system for adaptive neural decoding of a user with a network of interconnected (B ) processing units. The action corresponding to an output node with the maximum value among all the output nodes is selected. Depending on the desirability of the action an evaluative feedback is projected by the feedback module to at least one node in the network in order to modulate the synaptic weight updates.

Figure 2 shows schematic representations of the model for autonomous adaptation of the neuroprosthetics controller of the present invention. (A) shows one embodiment in which the actor maps the neural motor commands into actions during goal-directed interaction of the user with the environment. The actions of the Actor will modulate the reward expectation of the user. The critic will translate this reward expectation to an evaluative feedback that the actor will use for modifying its control policy. (B) shows another embodiment specific to a brain-machine interface (BMI) in which the actor interacts with the environment by selecting actions given input states. The critic is responsible for producing reward feedback that reflects the actions' impact on the environment, and which is used by the actor to improve its input to action mapping capability.

Figure 3 shows schematic representation of closed-loop simulation platforms for (A) interactive and (B ) single-step decoding tasks. The simulator is composed of three main components: synthetic neural data generator (user), neuroprosthetic controller and the 2D grid space (environment).

Figure 4 is a diagram of interactive neural decoding in the pinball task. The controller was initialized with random parameters at the beginning of the experiment. (A) success rate during multi-step reaching (pinball) task, value of 1 indicates success, 0 is failure. In each trial a new target at a random location in the 2D workspace was presented. (B) The ratio of the number of steps used by the controller to the number of steps of the shortest path to the target was computed as the deviation from the optimal trajectory. The system had converged to the optimal path after 8 trials. The network weight trajectories at the hidden layer (C ) and output layer (D ) show the controller stopped changing the parameter after learning the optimal neural to motor mapping.

Figure 5 is a diagram of neural reorganization in the multi- step decoding. The controller was initialized with random parameters at the beginning of the experiment. (A) success rate during multi-step reaching (pinball) task. In each trial a new target at a random location in the 2D workspace was presented. (B) The ratio of the number steps that the controller by the number of steps in the shortest path to the target was computed as the deviation from the optimal trajectory. The horizontal solid red- line shows when the controller was following the optimal trajectory. Although the controller learned to complete the reaching task in 2 trials, it took 11 trials to learn the optimal mapping between the neural states and actions . The controller continued to follow the optimal control policy even without adaptation (after trial 15) . After trial 25 the order of the neurons in the input neural vector was shuffled, rendering the previously learned control policy ineffective. From trial 25 to 35 the controller was not able to complete the task in any trial because there was no adaptation. By resuming adaptation (trial 35) the controller relearned to complete the task in one trial and converge to the optimal control policy after 7 trials .

Figure 6 is a diagram of the learning performance of the HRL controller in the single-step learning task. (A) During the first 25 trials one of the targets Tl and T2 was randomly presented while in the next 25 trials a novel task was introduced and one of the targets T3 and T4 was presented. Starting with random parameters, the controller reached 100% accuracy in both phases of learning. The parameters of the controller were fixed after trial 50 and the generalization performance was tested in one-step and multi-step classification (pinball) tasks. The trajectories of (B) action value assignment and (C ) network parameters during adaptation phase demonstrate that the controller will stabilize the network parameters during continuous adaptation after finding an effective control policy. By changing the task in trial 25 (iteration 280 in C, vertical dashed line) the network modified the control policy by readjusting its parameters, and after convergence again consolidated the projection.

Figure 7 is a diagram of the effect of (A) memory size and (B) number of hidden nodes in the network on the training and generalization performance of the controller. The memory size corresponds to the number of past trials (input-action-feedback) that were logged for the experience replay algorithm during adaptation. In each plot the dark and light blue bars correspond to the adaptation phase in the single-step learning task. The yellow and red bars show the generalization performance of the controller in the four-target classification and the pinball tasks. The optimal memory size for generalization performance in this task was 70 however, by increasing the memory size, it became harder for the network to learn the new task (light blue, vertical 2-target task) during the adaptation phase. The optimal number of hidden layer nodes was 5 in a network with 25 inputs and 4 outputs. Increasing the number of hidden layer nodes had an adverse effect on the generalization performance as well as learning the new task during the adaptation phase.

Figure 8 is a diagrammatic representation of the two target reaching task performed with monkeys using a robotic arm. The monkeys initiated each trial by placing their hand on a touch sensor for a random hold period. The robot then moved out from behind an opaque screen (a) and presented its gripper to the monkey (b) . The gripper held either a desirable object (waxworm or marshmallow, 'A' trials) or a wooden bead ('B' trials) . Simultaneously, a spatial target LED on either monkey's left (A trials) or right (B trials) was illuminated. The monkey was given a food reward if the RLBMI moved the robot to the illuminated target (c). The monkey received food rewards equivalently during both A and B trials . Figure 9 is a diagram of the multi-step neural decoding performance of the HRL algorithm using neural data from two monkeys over three days . The decoding performance was quantified by the success rate (decoding accuracy) in reaching the target and the length of reach trajectory (number of steps to the target) . In both monkeys the controller was initialized with random parameters at and the data from three experiment sessions was streamed sequentially to the controller. (A and B) give the performance in the presence of continuous adaptation and (C and D) show the effects of fixing the parameters of the controller and reorganizing the input by shuffling the order of neurons at the input at Day 2.

Figure 10 is a diagram of the decoding performance and trajectories of the controller parameters in response to the neural input reorganization (after day 1) during continuous adaptation in the multi-step reaching task. (A and B) The decoding performance was quantified by the success rate and the number of steps to the target in reaching tasks using the neural data from two monkeys. The input reorganization was introduced by randomly shuffling the order of neurons in the input neural vector at the end of day 1. (C and D) The weights of the network were initialized randomly at the beginning of the experiment. The variance of the weight trajectories decreases (at iteration 80 in monkey DU and 110 in monkey PR) as the network learns the effective mapping between the neural input and actions to maximize the positive evaluative feedback. By introducing a neural reorganization at the input (at the end of day 1), the optimal control policy changes and the controller readjusts its parameters to maintain performance. It is interesting that in monkey PR the controller underwent another readjustment at the beginning of day 3 without an external perturbation. There were four days gap between days 2 and 3 in monkey PR. This implies the day to day variability of the input neural vector .

Figure 11 is a diagram demonstrating the RLBMI could accurately learn to control the robot during closed loop experiments. In (a) , stems indicate the sequence of the different trials types (0=A trials, *=B trials) with the stem height indicating whether the robot was moved to the correct target or not. The dashed line gives the corresponding accuracy of the RLBMI performance within a five trial sliding window, (b) and (c) show how the RLBMI system gradually adapted the weights of the output and input layers, respectively, as it learned to control the robot. The weights indicate that the system had arrived at a consistent mapping by the fifth trial: at that point the weight adaptation progresses at a smooth rate and the robot is being moved effectively to the correct targets. At trial 23 an improper robot movement resulted in the weights being quickly adjusted to a modified, and still effective, mapping.

Figure 12 is a diagram demonstrating the RLBMI decoder accurately controlled the robot arm for both monkeys. Shown is the accuracy of the decoder (mean +/- standard deviation) following the initial adaptation period (trials 6:30). Both monkeys had good control during closed loop sessions (blue, DU : 93%, PR: 89%) . The open loop simulations (red) confirmed that system performance did not depend on the initial conditions of the algorithm weight parameters (DU: 94%, PR: 90%). Conversely, open-loop simulations in which the structure of the neural data was scrambled (black) confirmed that, despite its adaptation capabilities, the RLBMI decoder needed real neural states to perform above chance.

Figure 13 is a diagram demonstrating the RLBMI decoder consistently maintained high performance when applied in a contiguous fashion across closed loop sessions that up to two weeks. During the first session, the system was initialized with random parameters, and during each subsequent session the system was initialized using parameter weights it had learned previously. This approximates deploying the RLBMI across long time periods since it never has the opportunity to reset the weights and start over, by rather maintains performance by working with a single continuous progression of parameter weight adaptations. Shown is the accuracy of the robot movements during the first 25 trials of each session (0: solid lines). Furthermore despite working with the same sequence of weights for multiple days, the RLBMI was still able to quickly adapt when necessary. A mechanical connector failure caused a loss of 50% of the inputs for PR between day 9 and 16 (X: black dashed line), but the RLBMI adapted quickly and only a small performance drop resulted. This input loss was simulated in two sessions with DU (X: red dashed line), and the system again maintained performance. Furthermore, the RLBMI performance during those sessions was similar or better than in two final DU tests in which no input loss was simulated.

Figure 14 is a diagram demonstrating the RLBMI quickly adapted when 50% of the inputs were abruptly lost. Shown is the RLBMI performance accuracy within a five-trial sliding window (mean +/- standard deviation) during closed loop experiments (DU: blue dashed line and error bars, 4 sessions) and during open loop simulations (DU: gray line and panel, 1000 sims; PR: red line and panel, 700 sims) . In all tests, 50% of the inputs were abruptly lost following the 10th trial (black bar) . For both the closed loop experiments and open-loop simulations, the RLBMI had adapted and achieved high performance by the 10th trial. The RLBMI then re-adapted to the input loss and restored control within 5 trials. Inset panel contrasts the average results of the RLBMI open loop simulations (solid lines, DU : gray, PR: red) with the performance of a nonadaptive neural decoder (Wiener kernel, dashed lines, created using the first five trials of each simulation) . Without adaptation, the 50% input loss caused a permanent performance drop .

Figure 15 is a diagram demonstrating that when the recording electrodes detected new neurons, the RLBMI adaptation prevented the emergence of new firing patterns from degrading performance. Rather, it was able to quickly incorporate the new information into the input space. Shown is the RLBMI performance accuracy within a five-trial sliding window during closed loop sessions (DU: mean +/- std, blue dashed line with error bars, 4 sessions) and during open loop simulations (DU: mean +/- std, gray panel, 1000 sims; PR: red panel, 700 sims) . In all tests, a random 50% of the inputs were artificially silenced during the first 10 trials (black bar). The sudden appearance of the new inputs following the 10th trial caused a performance drop, with the RLBMI adapting within 5 trials. Inset panel contrasts the average results of the RLBMI open loop simulations (solid lines, DU : gray, PR: red) with the performance of a nonadaptive neural decoder (Wiener kernel, dashed lines, created using the first five trials of each simulation). Compared to the static (Wiener) decoder, the RLBMI ' s adaptation enabled it to better select useful inputs and to better use information from the newly appeared neurons, resulting in more stable performance both before and after the input perturbation.

Figure 16 is a diagram demonstrating the accuracy of the critic feedback influences the RLBMI performance. Shown is the accuracy of the RLBMI system (trials 1:30) during closed loop sessions (DU: blue squares, 5 sessions) and during open loop simulations (DU: black X, mean +/- standard deviation, 1000 sims; PR: red 0, 700 sims) when the accuracy of the critic feedback was varied (0.5->1.0). Gray line gives a 1:1 relationship. The RLBMI performance was directly impacted by the critic's accuracy. This suggests that choosing the source of critic feedback must involve a balance of factors such as: accessibility, accuracy, and frequency of feedback information, with adaptation preferably only being implemented when feedback confidence is high.

Figure 17 is a schematic representation of a system for responsive neurorehabilitation comprising an adaptive rehabilitation controller in connection with a rehabilitation device .

Figure 18 is a schematic representation illustrating an adaptive feature extractor component and an adaptive motor decoder component of the adaptive rehabilitation controller of Figure 17.

Figure 19 is a diagram demonstrating one example of the responsive neurorehabilitation invention through a hand grasping trial with a (A) feedback display showing a fixation cross for I s , followed by a cue for "open" or "close" for I s , and then a feedback of "correct" or "wrong" for I s . The trial was performed using a (B) BCI interface under an actor-critic reinforcement learning architecture. The actor decodes motor potentials and outputs an action shown on the display. The critic detects error potentials and provides feedback to the actor. The actors then uses feedback from the critic to adapt to the user.

Figure 20 is a diagramon demonstrating the accuracy of the BCI interface of Figure 19 showing (A) a simulation of session over 3500 trials, equivalent to a 3 hour recording session. The simulation used 50 features, power in lHz bins from l-50Hz. The actor performance during the simulation of 3500 trials is shown in (B), where the columns show data from different portions of the simulation: beginning, middle, and end. The first row shows the actor's cumulative classification accuracy. The second row shows the actor's weights adapting. The third row shows the actor output, with green stems indicating correct trials and red stems indicating incorrect trials .

Figure 21 is a diagram illustrating a method for adaptive neural decoding of the present invention.

Figure 22 is a diagram illustrating a method for responsive neurorehabilitation .

Like reference numerals refer to like parts throughout the several views of the drawings .

Detailed Description of the Preferred Embodiment 1. Overview

First, some background on the Hebbian theory is helpful. The Hebbian theory is a scientific theory in neuroscience which explains the adaptation of neurons in the brain during the learning process. The theory is perhaps best summarized as "cells that fire together, wire together." As such, Hebbian theory attempts to explain associative learning, in which simultaneous activation of neurons lead to strong increases in synaptic strength between those neurons.

In the design of artificial neurons and artificial neural networks, the Hebbian theory relates to a method of determining how to alter to the weights between the artificial neurons. This weight should increase when two neurons activate simultaneously, and reduce if they activate separately. Traditionally, nodes that tend to be both positive at the same time have strong positive weights, and nodes that tend to be both negative at the same time have strong negative weights.

The Hebbian theory as applied to neural decoding, or Hebbian reinforcement learning ("HRL"), involves the learning agent receiving an immediate reinforcement feedback (r) after each decision and the control policy (mapping between the states and action) is parameterized by the synaptic weights, ω , of a connectionist network (Pennartz, 1997, Vasilaki et al . , 2009, Williams, 1998) . The goal of the learning process is to search in the network's weight space (W) for an optimal set of parameters that maximizes the expected value of the reward E(r) (Sehnke et al . , 2010) . This can be accomplished using the policy gradient methods by estimating and optimizing the

(Sutton et al., 2000, Peters and Schaal, 2008) . REINFORCE is a class of policy gradient algorithms (Williams, 1992) in which the learning agent finds the optimal solution without needing to explicitly compute the gradient.

Consider the stochastic node j in a connectionist network with transfer function /(.) that receives input x± from node i through synaptic weight ω±^ and generates output Xj. The mass probability function g that node j takes a certain value φ can be written as

g((p,a)ij,Xi) = Pr(Xj = φ\ω₁₁, Xj) (El) It can be proved that this network with the following incremental update rule,

Δω^ = μ ( τ - b±j)dln g±/do)ij (E2) will climb the gradient of the expected reward where μ is the learning rate and b±j is the reward baseline. In fact, ( r - b±j)/dln g±/d(x)ij represents an unbiased estimate of the dE (r\W) / ω . This implies that the average weight update using the Eq. 2 will converge to a local maximum of r. For the special case of b±j = 0 and a stochastic binary node with logistic transfer function f(.), the probability that the output state of the node j is Xj, will be

P_j = /(∑w_j)t¾) (E3) and the update equation (E2) will become

The weight update algorithm using equation (E4), which is known as the reward-inaction algorithm in adaptive control theory (Narendra and Thathachar, 1989), was expanded to the associative reward- penalty algorithm (Barto and Anandan, 1985) by adding a penalty term to the update rule in equation (E4) :

Δω_±1 = μ⁺τ(χ± - p±)Xj + μ^~ ( 1 - r) (1 - x± - p±)Xj (E5) where the μ⁺ and μ^~ are separate learning rates for the reward and penalty components respectively. In connectionist networks, the update rule in equation (E5) captures the essence of Hebbian reinforcement learning by correlating the local presynaptic and postsynaptic activity in the network with a global reinforcement signal. In other words, r evaluates the "appropriateness" of the node's output, x_jr due to the input x_T (Hassoun, 1995) .

The HRL algorithm provides a general learning framework for connectionist networks in associative tasks by combining elements of supervised classification with reinforcement-based optimization (learning automata) . In fact, supervised learning may be viewed as an extreme case of the HRL in a stochastic unit, where the output of the unit is binary and there is one correct output for each input (Hassoun, 1995) . It has been demonstrated that this algorithm was equivalent to the error backpropagation in supervised learning for training a connectionist network (Mazzoni et al . , 1991) .

Modifying and building upon the Hebbian theory and reinforcement learning, various embodiments of the present invention is directed to neural decoding, mapping, and applications thereof, wherein the decoder is able to learn and automatically produce stable neural mapping and respond to perturbations by readjusting its parameters in order to maintain performance over time. 2. Adaptive Neural Decoder Architecture and Framework

In a preferred embodiment, the present invention relates to an improved Hebbian-inspired architecture and framework for a system for adaptive neural decoding of a user, as illustrated in Figures 1A-B . Accordingly, and drawing primary attention to Figure 1A, the system 100 generally comprises a neural vector 103, a feedback module 105, and an adaptive neural decoder 150. The neural decoder 150 generally comprises a network 102.

The network 102 is formed from a plurality of interconnected processing units 101. The processing units 101 may comprise sensory nodes 111, hidden nodes 121, and output nodes 131. These nodes 111, 121, and 131 may be structurally the same and/or configured the same in at least one embodiment. In other embodiments, the nodes 111, 121, and 131 may be structurally different and/or configured differently. In the illustrated embodiment of Figures 1A, the sensory nodes 111, hidden nodes 121, and output nodes 131 all share a similar structure of a processing unit 101 as illustrated in Figure IB. The different nodes in this embodiment may be configured differently, for instance the sensory input nodes may each only have one input, and thus the summing function of each of its synaptic weights as described below may not necessarily be required.

Further in the embodiment of Figure 1A, the network 102 comprises three layers: an input layer 110, a hidden layer 120, and an output layer 130. In other embodiments not shown, the network 102 may comprise more than three layers, for instance, a plurality of hidden layers 120. Having a plurality of hidden layers 120 might for instance, allow the neural decoder 150 or system 100 to form discontinuous functions or mappings between input and output. The network 102 may be fully connected, as illustrated in the Figure 1A, or be partially connected in other embodiments. Fully connected is defined as each node in one layer being connected to each node of the preceding layer. For example, each of a plurality of output nodes 131 is connected to each of a plurality of hidden nodes 121, and each of a plurality of hidden nodes 121 is connected to each of a plurality of input nodes 111.

Partially connected may comprise one node of each layer connected to at least one node of a preceding layer. The embodiment illustrated in Figure 1A represents a feedforward network. In other embodiments, the network may also be feedback in design, i.e. a recurrent neural network where connections between the nodes form a directed cycle.

The neural vector 103 comprises a plurality of brain signals in at least one embodiment. The neural vector 103 may comprise neural data of or related to a user's brain. The neural vector 103 may also comprise appropriate parameters for modulation of various brain signals. In at least one embodiment, the neural vector 103 may be mapped to particular signals of the brain, or to a particular region of the brain.

The input layer 110 comprises a plurality of sensory nodes 111. Each sensory node 111 is configured to receive an input signal from the neural vector 103, and output a plurality of sensory output signals. The input signal into the sensory node 111 may comprise a vector of firing rates or continuously valued potentials. In at least one embodiment, the valued potentials are normalized from -1 to 1 using the technique described below:

: Upper bound

Lh : Lower bound

Fj : Firing rate of the neuron i

Μαχέ'. Maximum firing rate of the neuron i

(E : Normalized firing rate of the neuron i

Where :

Ub - lb

{ ( }— Am i :x fl -f- Lb

This nominalization process may be performed by the neural vector 103, or by the sensory nodes 111.

The hidden layer 120 comprises a plurality of hidden nodes 121. Each hidden node 121 is configured to individually receive a sensory output signal from each of the sensory nodes 111, through each of a plurality of synaptic connections 140. In some embodiments where the network is partially connected, at least one hidden node 121 receives at least one sensory output signal from at least one sensory node 111, through at least one synaptic connection 140. Each different synaptic connection 140 is associated with an individual synaptic weight 141. Each hidden node 121 is further configured to calculate a probability based at least in part on the corresponding synaptic weights 141 of each of its synaptic connections to each of the sensory nodes 111. Based on its calculated probability, each hidden node 121 outputs a corresponding plurality of hidden output signals.

The output layer 130 comprises a plurality of output nodes 131. Similarly, each output node 131 is configured to individually receive a hidden output signal from each of the hidden nodes 1221, through each of a plurality of synaptic connections 140. Each different synaptic connection 140 is also associated with an individual synaptic weight 141. Each output node 131 is configured to calculate a probability based on the corresponding synaptic weights 141 of each of its synaptic connections to each of the hidden nodes 131. The output node 131 with the highest probability will be selected as the winning node 131'. The signal 106 corresponding to the winning node 131' will be output to the environment .

The probability calculated by each hidden node 121 and each output node 131 is determined by equation (E6) below, in at least one embodiment of the present invention.

(E6) Specifically, the hyperbolic tangent function is used to compute the probability (P_±) of being at each discrete state based on the net state of the node i. In this embodiment, W_±] represents the synaptic weight 140 between nodes i and j, and X₃ represents the output of node j . The output of a node is defined by equation (E7) below.

In this embodiment, the processing units 101 generate discrete values rather than continuous values. In some embodiments, the output of a node may comprise any positive value when the probability is greater than zero and any negative value when the probability is less than zero.

The environment may comprise additional apparatuses or systems configured to receive the output signal from the system 100 in order to perform an action. These additional apparatuses, systems, and any appropriate interfaces may be electrical, biological, mechanical, or any combinations thereof. In an electrical application, the environment may comprise at least one computer controllable at least in part by the system 100. In an electrical/mechanical application, the environment may comprise a brain-machine interface, at least a controller, and physical devices capable of movement, i.e. a robotic or mechanical apparatus or device such as a robotic arm, a vehicle, etc. In a primarily biological application, the environment may comprise the user's body and/or any related neuroprosthetics and related interfaces .

The feedback module 105 is configured to receive a feedback signal 107 from the environment. The feedback signal 107 may comprise positive or negative feedback, wherein positive feedback is defined as a successful outcome as compared to the intended action of the user, and negative feedback is defined as an unsuccessful outcome as compared to the intended action of the user .

The feedback signal 107 may comprise user-driven feedback. User-driven feedback may include brain signals from the user, for example, when the user is frustrated with an outcome compared to his or her intended action, the brain may emit certain signals associated with frustration, and these signals may in turn be used as negative feedback. In converse, when the user is pleased with the outcome compared to his or her intended action, the brain may emit certain signals tied to these emotions, which may be classified as positive feedback. User-driven feedback may also comprise motor feedback from the user. For example, if the user intends to send positive feedback or a negative feedback, he or she may choose to perform a motor action, e.g., blink twice, twitch a particular finger, nod, or any other action that may be associated with the feedback signal 107. This allows the user to continuously calibrate the neural decoder 100 without the need for external maintenance.

In at least one embodiment, the feedback signal 107 is used to readjust at least one synaptic weight of at least one processing unit 101. For instance, at least one of the synaptic weights of synaptic connections 140 related to the winning node 130' and connected nodes may be altered. In some embodiments, all synaptic weights related to the winning node 130' and connected nodes may be altered. The weight alteration or calculation may be performed by an external feedback module 105 in some embodiments. Alternatively, the weight alteration may be calculated by at least one processing unit 101 of the neural decoder 150, i.e. each process unit 101 may be further structured to process the receipt of a feedback signal 107 and adjust the synaptic weights of its synaptic connections 140 accordingly. In a preferred embodiment, the synaptic weight adjustment related to a negative feedback signal is greater relative to that of a positive feedback signal.

The feedback module 105 converts the received feedback signal 107 into binary form in at least one embodiment. Depending on the success or failure of the task, the feedback module 105 generates a +1 or -1 value respectively as an evaluative feedback. The synaptic weight alternation, or change in weight, may be based on the equation E8 below in a preferred embodiment:

V -S- vil

(E8) Here, AW^t _1] denotes the change in synaptic weight 140 between node j and node i at time t, γ is the learning rate and r_k is the evaluative feedback (+1 or -1) that is computed by the feedback module 105. After each time step t, the total weight update is computed by replaying the past data through the network and integrating over time. In equation (E8), the first term (left of the plus sign) corresponds to positive feedback and the second term (right of the plus sign) corresponds to negative feedback. The balance between these two terms is unique since they both contribute to the weight update. In this way, the system 100 becomes more sensitive to the negative feedback to quickly respond to failures. Conversely, in the case of positive feedback, the second term simply becomes zero. In the limit when p_± approaches x_±, the total weight update with approach zero. In other words, once the system 100 has converged to a stable control policy, the synaptic weights 140 will stop changing automatically in order to keep system 100 from becoming unstable. However, because negative reinforcement is weighed more heavily, the stabilized system 100 is still capable of learning changes between the neural states of the neural vector 103 and actions in the environment in an online fashion using equation (E8). Accordingly, when the neural data and the evaluative feedback are received, the system 10 is able to adapt to and adjust the synaptic weights 140 in real time under closed-loop control conditions .

Structurally, the feedback module 105 may be formed of any combination of circuits structured to receive at least one feedback signal 107. Moreover, in some embodiments the feedback module 105 may comprise a neural decoder, such as the adaptive neural decoder 150. Alternatively, the feedback module 105 may comprise a neural decoder comprising more than 3 layers, is of a feedback or feedforward configuration, or is of a fully connected or partially connected configuration.

Figure 21 offers a diagrammatic representation of another illustrative embodiment of a method for adaptive neural decoding of a user. In the embodiment of Figure 21, at least one neural input signal from a user's brain is received through at least one sensory node, as in 201. The sensory node(s) may be further configured to normalize any input signals into discrete values, such as from -1 to 1. Next, the at least one sensory node transmits at least one sensory output signal to at least one hidden node, as in 202.

The hidden node(s) receive the sensory output signal (s) through at least one synaptic connection, each synaptic connection is individually associated with a synaptic weight, as in 203. Each of the hidden nodes will then calculate a probability based at least in part on each synaptic weight of its synaptic connection ( s ) , as in 204. The hidden node(s) then transmit at least one hidden output signal to at least one output node, the hidden output signal (s) of each hidden node is defined at least in part by the probability of that hidden node, as in 205. In at least one embodiment, the hidden output signal of a hidden node will comprise a positive value if the probability of the hidden node is greater than zero, and a negative value if the probability of the hidden node is less than zero. Further, the output signal may output a zero if the probability is equal to zero, or alternatively, not output any signal.

At least one output node receives the hidden output signal (s) from the hidden node(s) through at least one synaptic connection, each synaptic connection here is similarly individually associated with a synaptic weight, as in 206. Each of the output nodes then calculate a probability based at least in part on each synaptic weight of its synaptic connections, as in 207. The output signal corresponding to the winning node is transmitted to the environment, as in 208, the winning node is the output node having the highest probability.

At least one feedback signal is received from the environment, as in 209. This feedback signal is based on the output signal corresponding to the winning node, i.e. the output signal of the winning node may affect a certain action in the environment, and a feedback signal may comprise a positive feedback if the action was an action intended by the user, or a negative feedback if the action was unintended. Based on the feedback signal, at least one synaptic weight of a synaptic connection is adjusted, as in 210. The synaptic weights adjusted may be those associated with the winning node and/or any connected hidden nodes. In a preferred embodiment, the synaptic weights for negative feedback will be adjusted to a greater extent than the adjustment for positive feedback.

In at least one embodiment, the sensory node, hidden node, output node, or combinations thereof, may comprise processing units 101. Further, in at least one embodiment, the calculation of probability in steps 204 and 207 comprise the formula shown in equation (E6) . Similarly, the output of signals in steps 205 and 208 may be based on the probability calculated, and be defined by equation (E7) . The adjustment of synaptic weights may also be defined by equation (E8) .

3. Adaptive Neural Decoding and Brain-Machine Interface

The adaptive neural decoder 150, such as the one described in the system 100, or the method of adaptive neural decoding 200, have many applications. For instance, the adaptive neural decoder may be used to increase effectiveness of neuroprosthetics, such as cochlear implants, retinal implants, etc. As another example, the adaptive neural decoder may similarly be used as part of a brain- machine interface for external devices. In at least one embodiment, the system 100, neural decoder 150, or an otherwise appropriate neural decoder or system or method of neural decoding or mapping may be specifically configured as a brain-machine interface controller ("BMI Controller"). In this embodiment, the BMI Controller maps neural states to machine actions . This may for instance, allow a user to control an external prosthetic limb, or some internal prosthetic device, as well as any number of other machines or devices .

Further, in some embodiments and examples below (e.g. classification tasks) the system 100 will have only one chance to select the correct action in each trial. This learning paradigm is referred to as a single-step mode of learning, as opposed to the multi-step mode where the system 100 can modify its behavior through multiple steps of interaction with the environment in each trial. The single-step learning paradigm can be viewed as the online classification of data streams. Unlike supervised classification, in which both input patterns and class labels are presented to train the classifier, for the present system no class label is available, instead, upon action selection the decoder 150 will receive a +1 or -1 feedback depending on the success or failure. However, since the decoder 150 needs to experience at least several trials in order to learn the task, a form of experience replay may be employed to increase the speed and learning in the single-step paradigm (Adam et al . , 2012, Wawrzynski, 2009) . After each trial, the neural vector 103, the selected action from the winning node 131', and the evaluative feedback from the feedback module 105 are registered in a database. Whenever updating the parameters, the network 102 goes through all previous entries of the database in order to modify the control policy by readjusting the synaptic weights 140. One embodiment of the experience replay is summarized below:

1 . Randomly initialize the synaptic weights 104 ( Wmit ) 2. At time step t, execute the action at: π (s_t, w_t) and receive a binary evaluative feedback rt£[-l, 1]

3 . Register the tuple {s_t, a_t, r_t) in the database

4 . For each step (i) contained in the database adapt the weight matrices

Re-evaluate the policy using a':

and = JZ<a' , r ,·>

3Ka',r_\> n = +1 n = -l

a' = CL\ +1 -1

a'≠ CL\ 0 +1

In essence when updating using l<.>: the same reinforcement as registered in the database was used if the same action was taken. Conversely, if a different action was taken from a previously rewarded action, indifferent feedback was offered and if a different action from a previously punished action was taken, positive reinforcement was offered .

Update W\ (equations E5-E8) using r and S\

5. Return to step 2

EXAMPLES

The adaptive neural decoding systems and methods are further illustrated in the following examples, which are provided by way of illustration and are not intended to be limiting. It will be appreciated that variations in alternatives in elements of the decoder, system and method shown will be apparent to those skilled in the art and are within the scope of embodiments of the present invention. Theoretical aspects are presented with the understanding that Applicants do not seek to be bound by the theory presented. All parts or amounts, unless otherwise specified, are by weight.

The terms "HRL controller," "brain-machine interface controller," "RLBMI controller," "controller," "neural decoder," or "adaptive neural decoder" refer at least in part to the system 100 described above in at least one embodiment. The term "HRL algorithm" refers at least in part the above equations. In these embodiments, the term "Actor" is synonymous to the network 102, and the term "Critic" is synonymous to the feedback module 105, unless otherwise stated. "Weights" of the network refer to synaptic weights 140. Neurons and synthetic neurons may be used interchangeably with nodes or processing units 101. a. Closed-Loop Simulation

In order to test the performance of the HRL controller in response to known neural states, a simulation platform was developed for neuroprosthetic reaching tasks in a 2D grid space to test both the multi-step and single-step learning performance of the controller. The goal of the controller was to infer the movement direction from the overall activity of the neural ensemble. The simulator was composed of three main components: synthetic neural data generator, neuroprosthetic controller (Actor), and the behavioral paradigm. Figure 3 shows the components of the closed-loop simulator and their interaction.

The synthetic neural data was generated using a biologically realistic model of spiking neurons ( Izhikevich, 2003). The user's neuronal activity was simulated by 5 neural ensembles where four of them each was tuned to one action (i.e. moving left, right, up, and down) . The neurons in the 5th ensemble were not tuned to any action to simulate the uncorrelated neurons often observed in real experiments (Sanchez et al . , 2004) . The goal of the user was to reach targets in a 2D grid space, therefore at each time- step the user generated a motor command by modulating the corresponding neural ensemble. A vector of the firing rates of all the neurons over a 100ms time window was used as the input feature vector to the controller in all the simulations. The four discrete actions (four principal directions of movement) available to the controller spanned the 2D grid space in Cartesian coordinate, thus these actions were called motor primitives of reaching in the grid space. The controller's task was to map the activity of neural ensembles to appropriate actions in order to reach the target. The learning performance of the controller was evaluated in the multi- step and single-step modes of the HRL algorithm. In the context of the reaching task these modes of learning are referred to as interactive decoding and learning motor primitives paradigms, respectively .

For the interactive learning paradigm (Figure 3A) , the user generated neural commands to reach to the target at each step and the controller's task was to infer user's intent based on the interaction with the user and environment. The sign of the cosine of the angle between the desired direction and the actual movement direction was fed back to the controller as the evaluative feedback. It is important to note that the controller did not have any information about the desired direction or location of the target. The evaluative feedback here just simulated the discrepancy between the user's expected movement and prosthetic actions. This process continued until reaching the target (success) or the timeout (failure) in each trial. In order to select a particular action, the user increased the firing rate of the corresponding neural ensemble above the base-line activity. It was assumed that the user was monitoring the performance of the controller and would generate an evaluative feedback that was modeled using a binary signal. If the action (movement) of the controller was in the direction of reaching to the target the user generated a +1 feedback otherwise the feedback was -1. In this paradigm the neuroprosthetic controller again had no specific information about the location of the target. The only information that was available to the controller was the user's neural motor commands and the evaluative feedback that followed the execution of each action.

For the single-step learning the motor primitives (Figure 3B) , the synthetic neural data was generated the same way as in the multi-step interactive learning paradigm, however the controller had only one chance to select the correct action. Depending on the success or failure, the controller received a +1 or -1 feedback in each trial. After learning the neural signature of each motor primitive in the single-step mode, the controller was tested in the pinball task to generate reach trajectories by sequential translation of user's neural pattern to motor primitives. In other words, the controller used the motor primitives as the building blocks of reach trajectories. In all the experiments the controller was initialized with random weights .

The performance of the controller was tested using a pinball reaching task in which the controller was required to reach targets at random locations in the 2D space (Gilja et al . , 2012) by selecting actions (movement direction) sequentially over multiple steps. Upon reaching to the target, a new target was presented and the previous target was regarded as the starting point for the new trial. A timeout period was defined within which if the task was not successful the trial was considered a failure and the location of the target changed to another random location. A constraint in the new target allocation ensured that the distance between the starting position and the target position was not less than 20 steps. b. Non-Human Primate Experiment

In order to validate the simulation results, the HRL controller was tested using real neural data recorded from two non-human primates (monkeys PR and DU) when they used a robot arm to accomplish a two-target reaching task. The marmoset monkeys (Callithrix jacchus) utilized a two choice Go/NoGo motor task to control the robot actions and earn food rewards (waxworms or marshmallows ) .

(1) Microwire Electrode Array Implantation

Each monkey received two microelectrode array implants. Each array consisted of 16 tungsten microwires (Tucker Davis Technologies, Alachua FL) . One array was implanted in the motor cortex, targeting arm and hand areas, and the other was implanted targeting the Nucleus Accumbens (NAcc) . A single craniotomy was opened and used for both the array implants. The dura was removed, and array placement made based on stereotaxic coordinates and cortical mapping (DU motor implant) . Arrays were inserted using a micropositioner (Kopf Instruments, Tujunga, CA) while monitoring electrophysiology landmarks (corpus callosum, internal capsule, anterior commissure, etc.) . The implant was secured using anchoring screws, one of which served as reference and ground, and the craniotomy was sealed using Genta C~ment (EMCMBV, Nijmegen, The Netherlands). Surgeries were conducted under sterile conditions using isolfurane (PR) or continuous ketamine infusion (DU) anesthesia, with cefazolin and buprenorphine administered postoperatively.

(11) Neural Data Recording

Neural data were acquired using a Tucker Davis Technologies RZ2 system (Tucker Davis Technologies, Alachua, FL) . Each array was re-referenced in real-time using a common average reference (CAR) composed of that particular array's 16 electrodes (if an electrode failed it was removed from the CAR) to improve SNR. Neural data were sampled at 24.414kHz and bandpass filtered ( 300Hz-5kHz ) . Action potential waveforms were discriminated in real-time based on manually defined waveform amplitudes and shapes. Both multineuron signals as well as well-isolated single neuron signals (collectively referred to here as neural signals) were used equivalently in all real-time and offline tests. On average there were 18.3+/-3.1 (mean +/- std) motor neural signals for DU and 21.1 +/-0.4 for PR (10 signals for PR following a mechanical connector failure in which half the electrodes were lost) . Neural signal firing rates were normalized (between -1 to 1) in real-time using a continually updated estimate of each signal's maximum firing rate.

(Hi) Brain-Machine Interface Control Architecture

For these experiments, the actor was a fully connected 3- layer feedforward neural network that used a Hebbian update structure similar to that described system 100 above. The actor input ( X) was the vector of spike counts for each of the motor cortex neural signals during a two second window following the go cue of each trial. A parsimonious network was chosen for decoding, using only 5 hidden nodes and two output nodes (one for each of the two robot reaching movements) . The output of each hidden node (OutHi) was a probability of firing (-1 to 1) computed using a hyperbolic tangent function, and in which WH1 are the synaptic weights between node i and the inputs ( b is a bias term) :

OutHi = tmiiiQX b] * WH ) _{( E 9 }}

Output nodes calculated action values (AV) for each of the j possible robot movements:

AVj J = tAii i

\ [tSiOutH] b] * WO_*) (E10)

S{OutH} is a sign function of the hidden layer outputs (positive values become +1, negative values become -1), and WOj are the weights between output j and the hidden layer. The robot action with the highest action value was implemented each trial. The actor weights were initialized using random numbers, which were updated (AW) using the critic feedback (f) :

&W = μ_Η * ί bf * (S{Outtt} - OMR))

+ (1 - f) {X bf * (Ϊ - SiOutH - Oiitm)

^{■ ■} (Ell)

AWO =½ * f *{lOntH bf * iSlAV) -iif )

+ μ_ΰ. * (t - f) *([QutH bf * (I - ${A¥} - A¥}) (E12 ) Feedback is +1 if the previous action selection is rewarded and -1 otherwise. S{} is again the sign function and μ_Η and μ₀ are learning rates of the hidden (0.01) and output (0.05) layers, respectively. The update equations are structured so that whenever nodes fire together their synaptic weights are updated in proportion to the feedback, giving the learning its Hebbian aspect .

(iv) The Reaching Task

As shown in Figure 8, the monkeys initiated each trial by putting their hand on a touchpad and keeping it motionless for a random hold period (700-1200 msec) . The hold period was followed by an audio go cue, which coincided with the robot arm moving out from behind an opaque shield and presenting its gripper. The gripper held either a desirable food treat (waxworm or marshmallow: "A" trials), or an undesirable object (wooden head: " " trials). Simultaneously with the gripper presentation, one of two spatial reaching target LEDs on either the monkey's left (A trial, desirable object) or right (B trial, undesirable object) side was illuminated. To move the robot to the A (Go) target, the monkeys were required to reach and touch a target sensor within a 2 second time limit. To move the robot to the B target (NoGo) , the monkeys had to keep their hand motionless on the touchpad for 2.5 seconds. By controlling the robot so that it reached the cued target LED successfully, the monkeys received a food reward for both A and B trials. The proportion of A and B trials were kept roughly equivalent, and they were presented in a pseudo-random order. While the monkeys were performing the task, motor cortex neural activity was recorded as described above.

To speed the initial adaptation of the reinforcement learning algorithm, real time 'epoching' of the data was used. After each robot action, the algorithm weights were updated (equations Ell and E12) using not only the most recent trial's data, but rather with all previous trials in that session, with the buffered trials being used to update the weights ten times following each action. This helped prevent the monkeys from becoming frustrated at the beginning of sessions by moving the system more rapidly away from random exploratory movements.

(v) Initializing Conditions

During real-time closed loop robot control experiments the parameter weights of the RLBMI were initialized with random values, with the RLBMI learning effective action mappings through experience (equations Ell and E12). Performance was quantified as percentage of trials in which the target was achieved. In addition to these closed-loop real-time experiments, a large number of offline 'open-loop' Monte Carlo simulations were also run to exhaustively confirm that the RLBMI was robust in terms of its initial conditions, i.e. that convergence of the actor weights to an effective control state was not dependent on any specific subset of initialization values. For the simulations, the neural data and corresponding trial targets for the first 30 trials of several closed-loop BMI sessions from both monkeys (10 sessions for DU and 7 for PR) were used to build a database for offline open-loop simulations. For the simulations, data from each session were re-run 100 times, with different random initial conditions used for each test.

(vi) Performance over Long Time Periods

The RLBMI was also tested to see how it would perform applied in closed-loop mode across long time periods. For these contiguous multisession tests a sequence of robot reaching experiments were run for each monkey, with the RLBMI starting from a random set of initial conditions during the first session. During the follow-up sessions, the RLBMI was initialized from weights that it had learned from the prior session, and then continued to adapt over time (equations Ell and E12) .

(vii) Input Space Perturbations

For BMI systems to show truly stable performance, nonstationarities or other changes in the input space should not adversely affect performance. While some changes of the input space can be beneficial, such as neurons changing their firing pattern to better suit the BMI controller, large changes in the firing patterns of the inputs that dramatically remove the input space from that which the BMI had been constructed around are significant problems for BMIs. Such perturbations to the inputs can result from neurons appearing or disappear from the electrode recordings, a common occurrence in electrophysiology recordings. During the contiguous multisession tests for PR, a mechanical connector failure resulted in half of the neural signals inputs to the RLBMI being lost. However, another contiguous session was run in which the RLBMI successfully adapted to this change to its inputs. This input loss was simulated in two sessions with monkey DU. In those experiments, a random half of the motor neural signals were selected (the same signals in each test), and in those experiments the firing rates of the inputs set to zero. For comparison purposes, in monkey DU two final contiguous session experiments were also run in which the whole input space remained available to the RLBMI system.

In additional closed-loop BMI sessions, the BMI inputs were deliberately altered (following the initial period in which RLBMI had adapted and gained accurate control of the robot) to further test the RLBMI ' s ability to cope with large-scale input perturbations. Specifically, during input loss tests, following 10 trials, the online neuron waveform identification boxes were moved so that a random 50% of the neural signals were no longer detected. Similarly, in other experiments, when the RLBMI was initialized at the beginning of the experiment, the waveform boxes for a random half of the available neural signals were placed so that action potentials were not being acquired. After the initial adaptation of the RLBMI, the sorting parameters were updated so that 'new' neural signals abruptly appeared amongst the BMI inputs. The real-time experimental results were confirmed with additional offline simulations, using the Monte Carlo simulation database previously described. For input loss tests, the firing rates for a randomly chosen (during each simulation) half of the neural signals were set to zero after 10 trials. Similarly, for the 'found neuron' simulations, for each simulation half the inputs were randomly selected to have their firing rates set to zero for the first 10 trials.

In general, for all the algorithm controller testing experiments, the controller was tested in an on-line fashion (in machine learning terms) which means the input data was streamed to the controller sequentially the same way as would be done during real-time experiments. Moreover, all animal care, surgical, and research procedures were consistent with the National Research Council Guide for the Care and Use of Laboratory Animals and were approved by the University of Miami Institutional Animal Care and Use Committee. c. Results

In order to characterize the learning properties of the HRL algorithm of the neural decoder, simulations were used to encode known states in the activity of synthetic neurons at the input of the controller in order to test the learning, robustness, and generalization performance of the HRL controller during interactive multi-step and single-step decoding tasks (sections i- iv below) .

(i) Learning Performance in Interactive Decoding

The structure of the Actor network consisted of 25 sensory nodes at the input (one for each synthetic neuron) and 4 nodes at the output (one for each of the output actions) . The network consisted of 5 nodes at the hidden layer. This was the minimum number of nodes that yielded the optimal performance. A bias term was included at the hidden and output layers of the network. The network weights continuously updated during the task. The weights of the network were initialized randomly at the beginning of the experiment. Figure 4 shows the performance of the controller in terms of success rate over time and time to the target during a representative experiment consisting of 50 trials. The time to the target in each trial was computed based on the ratio of the minimum number of steps to the target (shortest reach trajectory) and the number of steps that the controller actually took to reach the target in that trial. In the pinball task target locations were selected randomly, therefore the number of steps to the target was different in each trial. In this experiment the shortest and longest reach trajectory were 25 and 80 steps, respectively (mean ± SD: 41.1 ± 13.9). The horizontal solid red-line is a reference that indicates whether the controller followed the optimal trajectory in a given trial or not. The controller learned to complete the reaching task in 3 trials and found the optimal mapping between the neural states and actions in 8 trials, followed the optimal control policy afterward and completed the task with 100% accuracy. The weight trajectory of the controller shows that after learning the task the controller stopped changing the weights and the control policy was consolidated.

(ii) Generalization Performance of the Controller

In order to test the generalization performance of the controller, the weights of the network were fixed after convergence. In these experiments the controller was again initialized using random weights. As shown in Figure 5, the controller learned to complete the task in 2 trials and converged to the optimal control policy after 10 trials. The controller continued to follow the optimal control policy after the parameters had been fixed (trial 15) and continued to complete trials successfully without need for adaptation until trial 25. Following trial 25, the control policy was perturbed by shuffling the tuning of the neurons, completely changing the mapping between the neural pattern and the desired actions. Figure 5 shows that the controller was not able to complete the task in any trial after perturbation, however, when the network was again allowed to start adapting (trial 35), it was able to learn a new control policy and recover the same level of performance as before perturbation .

(Hi) Learning Motor Primitives in Single-Step Mode

The results in this section demonstrate that the controller is able to generate reach trajectories by learning motor primitives. Furthermore, the algorithm used in the controller in single-step mode can be used in real-time classification tasks. The controller was tested in this experiment with 100 trials using the closed-loop simulation setup in Figure 4B. The controller had to reach the target in one step by selecting one of the four actions. If the controller chose the correct action it received a positive reinforcement (+1) otherwise a negative reinforcement was received (-1) . In each of the first 25 trials, one of the two targets (Tl and T2 span the horizontal ID line) was presented randomly. Figure 6A shows the performance of the controller in mapping the input neural activity to an appropriate action in this task. The controller was able to find this mapping with 100% accuracy. In the next 25 trials targets T3 and T4 (spanning the vertical ID line) were presented to the controller as a novel task and the controller was able to learn the new task with 100% accuracy .

After trial 50, the parameters of the controller were fixed and one of the four targets was selected randomly and presented to the controller in each trial. Figure 6A shows the controller was able to continue performing the task with 100% accuracy without need for further learning and adaptation. After trial 75, the same controller was tested in the pinball task where targets were presented in random locations in a 100x100 grid. In order to complete the task in each trial the controller was required to select multiple actions sequentially. After reaching the target a new target was presented at a random location and the previous target was used as the initial position. In all the trials, the distance between the two consecutive targets was at least 20 steps away .

Figures 6B and 6C show the behavior of the controller during the learning phase. Figure 6B shows how the network assigned action values in each task. At the early stage of learning the action value assignment was random however as the controller learned the neural-to-action mapping the controller maximized the value of the actions that were necessary for completing the task. As the network learned the task, the absolute value of actions were asymptotic to 1. Considering equation (E6), this means once the controller learned an effective control policy via positive reinforcement, it automatically reduced the magnitude of the weight changes and consolidate the control policy. This effect is demonstrated in Figure 6C that shows the trajectories of the network weights at the hidden and output layers during the learning. The dashed vertical line shows the point that the control task was changed.

As the network converged to an effective control policy the weight trajectories plateaued. At trial 25, the controller had no information about the task change (presentation of the T3 and T4 as new targets), the only information that led the controller to change its control policy was the evaluative feedback. The negative evaluative feedback caused an increase in the variance of the weight change magnitude in equation (E8) and automatically forced the controller to search for a new control policy. In Figure 6, this adaptation is reflected both in the action-value assignment and the weight update trajectories. Once the controller found the new control policy, the controller exhibited the same behavior in the action value assignment and weight update trajectories . (iv) Effect of Memory and Network Size on Performance

The effect of memory size and the network size (number of hidden nodes) on the performance of the controller was tested in the single-step learning the motor primitives task using 100 Monte Carlo (MC) simulations (Erdogmus et al . , 2005). Each simulation consisted of 50 trials and the controller was initialized with a different set of random weights drawn from the same uniform distribution. The convergence criterion was defined as the minimum 95% decoding accuracy over a block of 20 consecutive trials. Here the memory size was the number of the past trials in the database. Each memory entry consisted of input neural data, action of the controller, and the evaluative feedback corresponding to that action. Figure 7A shows the effect of memory size on the learning properties and the generalization performance of the controller. These results also quantify the effect of initial conditions on the performance of the controller. With the shortest memory (keeping only the previous trial) the controller was able to learn the first and the second tasks in 64% and 60% of the MCs, however controller was able to generalize to more complex tasks in less than 10% of the MCs. By increasing the memory size up to 70 trials, the generalization ability of the controller improved, however, beyond that point the generalization started to decline. The effect of network size (number of hidden layer nodes) on the learning and generalization performance of the controller was also tested. In this analysis, the network input size was 25 and the network had 4 output nodes. The results in Figure 7B show the optimal number of hidden layer nodes in this experiment was 5, the network with 3 hidden nodes was more sensitive to the initial condition of parameters in terms of generalization and learning the new task during adaptation. By increasing the number of hidden nodes, the performance of the controller dropped both in terms of learning a new task and generalization to more complex tasks.

(v) Primate Neural Decoding in Multi-Step Tasks

To verify the simulated results, the performance of the controller was tested in multi-step trajectory decoding task using real neural data that was recorded over several (3) sessions from two monkeys. The decoding task was to reconstruct robot trajectory (methods section) by mapping the monkey's neural modulation onto 4 actions. In each trial, one of two fixed targets (Tl and T2) in a 2D grid space was presented to the controller. The direct path between the initial (center) position of the controller and the targets was 4 steps . The controller was free to move to any point in the 2D grid-space, however if the task was not completed in 8 steps the trial was marked as failure.

The controller parameters were initialized randomly and the controller continuously adapted over three sessions for each monkey. As demonstrated in Figures 9A and 9B, in both monkeys the controller converged after 20 trials and reached above 95% decoding accuracy. The order of target presentations was randomized throughout the experiment and it was compatible with the order of target presentation in the real experiments. After convergence, the controller found the shortest path to the target in 95% and 87% of the trials in monkey PR and DU, respectively. The average natural reach time during the experiments was 370 ms for monkey PR and 670 ms for monkey DU . The number of steps in each trial decreased over time as the controller learned an effective control policy based on the input neural states, and the controller often followed the optimal policy (direct path to the target in 4 steps) .

The effect of day-to-day variability on the performance of the controller was tested by stopping the adaptation of the controller after Day 1 and testing the performance in the data from Days 2 and 3. In monkey DU, without adaptation, the performance dropped from 100% to 64% in Day 2 and from 96% to 62% in Day 3. In monkey PR, the decoding performance slightly dropped from 95% to 90% in day 2 but the performance drop was more profound in day 3 (from 98% to 67%) . To further test the effect of adaptation, after fixing the parameters of the controller at the end of Day 1 to prevent adaptation, a perturbation was introduced by shuffling the order of neurons in the input neural vector. As a result of this perturbation, Figure 9C demonstrates that the decoding performance dropped to 54% and 48% in Days 2 and 3 respectively for monkey DU . Likewise for monkey PR, the decoding performance declined to 52% in Day 2 and 55% in Day 3 (Figure 9D) . The number of steps in the success (4 steps) and failure trials (8 steps) demonstrate that the controller always took the direct path to target T2 which means the controller was not able to distinguish between the Tl and T2 states after the perturbation and followed the same control policy in both neural states.

In the presence of continuous adaptation, the controller was able to recover the performance following the perturbation by readjusting its parameters and modifying the mapping between the neuronal activity and action values in both monkeys (Figures 10A and 10B) . The smooth trajectories in Figures IOC and 10D indicate that the controller converged to an effective control policy and reached to a steady-state after about 65 iterations in monkey DU and 110 iterations in monkey PR. At the beginning of Day 2, a perturbation was introduced to the controller by shuffling the order of the input neurons. This perturbation was introduced after the convergence of the network in Day 1. In both monkeys the controller reorganized its parameters to adjust the mapping between the neural states and action values to maintain the performance. Depending on the operating point of the controller in the input space, the level of readjustment might be different. In monkey DU, the controller had a smooth readjustment and converged after 1 trial however, in monkey PR the readjustment was more profound and it took longer for the controller to converge to its new control policy.

(vi) Brain-Machine Interface Control of Robot Arm

Not only did the controller verify the simulated data, but it was also demonstrated to successfully control a device through a brain-machine interface. In this series of experiments, the actor-critic RLBMI effectively controlled the robot reaching movements as dictated by the previously described monkeys. Figure 11 shows a typical closed loop RLBMI experimental session (PR) . Figure 11a shows that the algorithm converged to an effective control state in less than 5 trials, at which point the robot began to consistently make successful movements. The algorithm was initialized using small random numbers (between +/-.075) for the parameter weights (equations E9 and E10) . Figure lib shows the gradual adaptation of the weight values of the two output nodes (equation E12) as the algorithm learned to map neural states to robot actions. Figure 11c shows a similar adaptation progression for the hidden layer weights. The weights initially changed rapidly as the system moved away from random explorations, followed by smooth adaptation and stabilization when critic feedback consistently indicated good performance. Larger adaptations occurred when the feedback indicated an error had been made. Figure 12 shows that the RLBMI controller reached for the correct target during approximately 90% of the trials (blue bar: mean +/- standard deviation; DU : 93%, 5 sessions; PR: 89%, 4 sessions) . The accuracy results in Figure 12 correspond to trials 6-30 since the first 5 trials were classified as an initial adaptation period and the monkeys typically became satiated with food rewards and ceased interacting with the task after 30 to 50 trials .

Figure 12 also shows that the RLBMI decoder was robust to the specific initial conditions of the weights. During each of the closed-loop BMI sessions the RLBMI algorithm found an effective control policy regardless of the starting weight values. Each of the open-loop Monte Carlo simulations (DU: 1000 simulations, PR: 700) also started with random conditions, and resulted in a similar accuracy as the closed loop experiments (Figure 12), confirming that the system could converge to an effective control state from a wide range of initial conditions.

To further validate the control approach, a surrogate data test was performed in which randomized data was presented to the RLBMI decoder. In each of these open-loop simulation tests, the order of the trial type was preserved while the order of the recorded motor cortex neural data was randomly reshuffled, destroying any consistent neural representations associated with the desired robot movements. Despite the decoder's adaptation capabilities, Figure 12 shows that the RLBMI system was only able to perform at chance levels under these conditions, confirming that successful control of the robot arm was driven by specific states of the neural data.

(vii) Stability of BMI Controller over Multiple Sessions The RLBMI maintained high performance when applied in a contiguous fashion across experimental sessions spanning up to 17 days, as shown in Figure 13. The decoder weights started from random initial conditions during the first session, and during subsequent sessions the system was initialized from weights learned in the previous session (from the 25th trial), and was then allowed to adapt as usual (equations Ell and E12) without any new initializations or interventions by the experimenters to approximate use of the BMI over long time periods. The solid lines in Figure 13 give the accuracy of the system during the first 25 trials (mean: DU : 86%; PR: 93%) of each session when the inputs were consistent. For monkey PR a mechanical failure in the implant connector resulted in the loss of half of the recorded neuronal signals between day 9 and 16 (dashed line), however the system was able to quickly adapt and this loss resulted in only a slight dip in performance (4%). Likewise, the controller maintained performance during two DU sessions (day 8 and 13, dashed line) in which a similar input loss was simulated, as described above in the Methods. In fact, performance during those sessions was similar or better to DU tests that continued to use all the available neural signals (days 14 and 17) .

(viii) Stability of BMI Controller over Input Perturbations The RLBMI was capable of adapting itself to compensate for large-scale changes in the firing properties of the input neurons. These perturbation tests included losses of 50% of the inputs, as well as abrupt doublings of the neural signals being acquired by the implanted electrodes.

The RLBMI ' s ability to compensate for losses of neural signals was tested after it had already achieved control of the robot arm. Figure 14 gives the accuracy of the RLBMI decoder within a 5-trial sliding window. Shown is the mean and standard deviation of the performance averaged across closed-loop BMI experiments (DU: dashed line and error bars, 4 sessions) and across open-loop simulations (DU: gray line and panel, 1000 simulations; PR: red line and panel, 700 simulations). During each of the tests a random 50% of the inputs were lost following trial 10 (vertical black bar) . The RLBMI fully adapted to the perturbation within 5 trials, restoring effective control of the robot to the monkey.

Figure 15 shows that the RLBMI system was also able to effectively incorporate newly 'found' neural signals into its input space. This input perturbation occurred following the 10th trial (vertical black bar), prior to that point a random 50% of the RLBMI inputs had had their firing rate information set to zero. Both closed-loop BMI experiments (DU: dashed line and error bars, 4 sessions) and open-loop simulations (DU: gray line and panel, 1000 simulations; PR: red line and panel, 700 simulations) showed that the system had again adapted to the input perturbation within 5 trials. In contrast to the input loss experiments, Figure 15 shows greater variation in performance immediately following the addition of the new inputs. This may reflect the variability to which the RLBMI algorithm had learned to ignore the previously silent channels, combined with the magnitude of the firing activity of the neural signals once they were 'found'. In situations in which the algorithm had set the silent channel parameter weights very close to zero, or in which the activity of the new channels was relatively low, the addition of the new neural signals would have had little impact on performance until the RLBMI controller reweighted the perturbed inputs appropriately. Conversely, during the input loss tests there would be a higher probability that dropped inputs had had significant weight parameters previously attached to their activity, resulting in a more obvious impact on overall performance when those neural signals were lost. (ix) Advantage over Non-Adaptive Decoders

It is important that changes in a BMI ' s neural input space do not diminish the user's control, especially when considering longer time periods where such shifts are inevitable. For example, losses and gains of neurons are very common with electrophysiology recordings using chronically implanted microelectrode arrays: electrodes fail entirely, small relative motions between the brain and the electrodes cause neurons to appear and disappear, and even the longest lasting recording arrays show gradual losses of neurons over time from either tissue encapsulation of the electrodes or from the gradual degradation of the electrode material. While some changes in input behavior can be beneficial, such as neurons gradually adopting new firing patterns to provide a BMI user greater control of the system, large and/or sudden changes in neuron firing patterns (such as neurons disappearing or appearing amongst the BMI input recordings) will almost always reduce a BMI user's control if the system cannot compensate. Basic static neural decoders are significantly affected by additions or losses of input neurons. However, following numerous trials, such as where either half the neural signals were lost or acquired new firing patterns, its performance immediately drops and remains near chance thereafter. This is demonstrated in Figures 14 and 15 (insets) .

While input losses may be a common adverse perturbation to BMI systems, the appearance of new neurons is also a significant input perturbation: when the representations of new neurons overlap with neurons that were already being used as BMI inputs, this causes the previous inputs to appear to have acquired new firing patterns, altering the BMI ' s input space (as was simulated in Figure 15) . Such appearances could be a particular issue in BMI systems that rely on action potential threshold crossings on a per electrode basis to detect input activity. Finally, BMIs that cannot take advantage of new sources of information lose the opportunity to compensate for losses of other neurons.

Currently, most BMI experiments avoid the issue of large changes in input neurons on BMI performance since the experimenters reinitialize the systems on, at least, a daily basis. However, it is important for practical BMI systems to have a straightforward method of dealing with neural input space perturbations that is not a burden on the BMI user. The BMI controller of the present invention does not require the intervention of an external technician (such as an engineer or caregiver) to recalibrate the BMI following changes in the input space. Rather, it automatically incorporates newly available neural information into the input space, even as it compensates for input losses, by adapting whenever it receives training feedback, as demonstrated in Figures 14 and 15 in which the RLBMI suffered only a transient drop in performance despite neural signals appearing in or disappearing from the input space. Furthermore, since the RLBMI is constantly revising which electrode to use and which to ignore, time does not need to be spent evaluating specific neurons to be used as inputs when initializing the system.

RL adaptation as used in the present invention is designed so that it does not confound natural learning processes of the user, and adapts to neural plasticity. RL adaption occurs primarily when natural neuron adaptation is insufficient, such as during initialization of the BMI system or in response to large input space perturbations. Figures 11, 14, and 15 show that the current RLBMI architecture offers smooth adaptation and stable control for a basic robot reaching task under both such conditions. (x) Accuracy of Feedback and BMI Controller Performance

The ability of the RLBMI system to appropriately adapt itself depends on the system receiving useful feedback regarding its current performance. Thus both how accurate the critic feedback is and how often it is available directly impacts the RLBMI ' s performance. The current experimental setup assumed an ideal case in which completely accurate feedback was available immediately following each robot action. While such a situation is unlikely in everyday life, it is not essential for RL that feedback always be available and/or correct, and there are many potential methods by which feedback information can be obtained.

The RLBMI architecture presented here does not intrinsically assume perpetually available feedback, but rather only needs feedback when necessary and convenient. If no feedback information is available, then the update equations are simply not implemented and the current system parameters remain unchanged. Since feedback information does not depend on any particular preprogrammed training paradigm, but rather simply involves the user contributing good/bad information during whatever task for which they are currently using the BMI, this makes the system straightforward to update by the user whenever is convenient and they feel the RLBMI performance has degraded.

Other algorithms are designed specifically to take advantage of only infrequently available feedback by relating it to multiple earlier actions that were taken by the system and which ultimately lead to the feedback. When considering possible sources of feedback information, it is important to consider how the critic accuracy impacts on the RLBMI ' s overall performance. Accordingly, several closed loop experiments were ran and offline simulations were used in testing how well the RLBMI algorithm was able to classify trials from the closed loop BMI experiments when the accuracy of the critic feedback varied. Figure 16 shows how the RLBMI performance can be limited by the accuracy of the feedback. Thus for the current RLBMI architecture it may be better to only use feedback information when the confidence in its accuracy is high, even if that means feedback is obtained less frequently.

While assuming that ideal feedback is available following each action may not be practical for real BMI systems, the fact that the necessary training feedback is just a binary 'good/bad' signal (even when the system is expanded to include more than two robot actions) that only needs to be provided when the user feels the BMI performance needs to be updated, leaves many options for how even a user suffering from extreme paralysis could learn to provide critic feedback. There are also a wide variety of potential options for the RLBMI user to provide critic feedback to the system, including using reward or error information encoded in the brain itself. 4. Adaptive Neural Decoding and Rehabilitation

Another application of the systems and methods for adaptive neural decoding may be related to patient rehabilitation related to brain or nervous system injury, or neurorehabilitation . Neurorehabilitation is a specialty of neuroscience, which involves medical processes that aid a patient's recovery from a central nervous system injury (CNS), such as spinal cord injury (SCI) . Some examples of conditions commonly treated by neurorehabilitation include brain injury, stroke recovery, cerebral palsy, Parkinson's disease, multiple sclerosis, post- polio syndrome, Guillain-Barre syndrome.

For instance, CNS injury impairs the connection between the brain and the body, and makes it difficult to perform even simple tasks (e.g., eating). Upper and lower limb treatment in physical therapy (PT) and occupational therapy (OT) are expected to restore motor control through a change in brain function. However, these approaches to affecting brain function are indirect, not quantitative, and do not fully engage the CNS. Accordingly, the present system and methods for adaptive neural decoding is helpful in rehabilitation training because they are able to provide feedback to induce changes in the size and location of cortical maps after spinal cord injury (SCI) . By directly re-engaging neurons in the brain with Responsive NeuroRehabilitation (RNR) , therapy can strengthen the connections between the brain and the body more efficiently.

In a preferred embodiment of the present invention, a system and method for improved or responsive neurorehabilitation uses rehabilitation training with feedback to induce plastic changes in CNS neuronal representation. Specifically, this invention combines rehabilitation, brain interface technology, and a novel training regimen to responsively translate patient intent into commands to the rehabilitation equipment in order to better allow the patient to manipulate the neuroplastic process associated with rehabilitation including function electrical stimulation (FES), instrumented objects, etc.

FES is a technique that uses electrical currents to activate nerves innervating extremities affected by paralysis. Injuries to the spinal cord interfere with electrical signals between the brain and the muscles, resulting in paralysis below the level of injury. Restoration of limb function as well as regulation of organ functions are the main applications of FES. Accordingly, the present invention may facilitate a more efficient and expedient recovery using FES.

Accordingly, as illustrated in Figure 17, the system 300 of the RNR invention generally comprises an adaptive rehabilitation controller 310, at least one rehabilitation device 330, and a patient or user 320. The adaptive rehabilitation controller 310 generally further comprises an adaptive feature extractor 311 and an adaptive motor decoder 312.

The user 320 may be any life form capable of emitting a brain or neural signal 321. The neural signal 321 may be interpreted in an electroencephalograpy (EEG) , magnetoencephalography (MEG) , or any other methods and/or device for functional neuroimaging or recording of brain activity of a user 320.

The adaptive rehabilitation controller 310 is configured to receive a neural signal 321 from the user 320. Specifically, in a preferred embodiment, the neural signal 321 is received by an adaptive feature extractor 311. The adaptive feature extractor 311 comprises at least one feature. A feature may be a representation of certain characteristics of the neural signal 321. For instance, features may be defined by frequency, amplitude, locality, patterns, other neural signal characteristics, or any combinations thereof. A feature might be associated with neural signals characteristic to a user's emotional state, motor movements, or other chemical or biological states or changes. A feature might be associated with EEGs, MEGs, or other methods for recording brain activity or for functional neuroimaging. The features 315 associated or triggered by the neural signal 321 in the feature extractor 311 are then passed to the adaptive neural decoder 312.

The features may be predetermined, custom input, or created by a programmed routine. In at least one embodiment, the adaptive feature extractor is further configured to dynamically generate additional features to be passed to the adaptive decoder. In at least one embodiment, the additional features may be transmitted to be mapped, after the existing features have converged or stabilized as being associated with a certain action; or alternatively are ruled out as background noise or not relevant. In at least one embodiment, the adaptive rehabilitation controller 310 comprises features based on the particular rehabilitation device 330 being used. Further, the adaptive rehabilitation controller 310, or another device, may be able to control the thresholds of neural activity needed for delivering responses from a variety of rehabilitation.

The adaptive motor decoder 312 is configured to receive at least one feature 315 from the adaptive feature extractor 311. Specifically, in a preferred embodiment, the adaptive motor decoder 312 learns to map the features received to the user's intent, based on the feedback received by the user. In at least one embodiment, the adaptive motor decoder 312 may comprise a three-layer feedforward fully connected neural network. The adaptive motor decoder 312 may comprise a neural decoder 150 as discussed above. Accordingly, the adaptive motor decoder 312 maps each of the features 315 received to a particular output or control signal 317, in order to perform a particular action.

The rehabilitation device 330 is configured to receive a control signal 317 from the adaptive rehabilitation controller 310. The rehabilitation device 330 may comprise a functional electrical stimulator (FES), instrumented objects, or any other devices appropriate for neurorehabilitation of a user 321. Based on the action performed by the device 330, the user 320 receives sensory feedback 331 which is transmitted as a neural signal 321 back into the adaptive rehabilitation controller 300.

The adaptive feature extractor 311, based on the feedback from the user, transmits certain feedback features 316 triggered or associated to the feedback signals to the adaptive decoder 312 in at least one embodiment. For instance, the sensory feedback 331 received by the user 320 may result in negative feedback or positive feedback sent to the adaptive feature extractor 311, as described in the embodiments above in the system 100. The adaptive feature extractor 311 may then transmit certain feedback features 316 associated with either positive or negative signals 321. The adaptive motor decoder 312, based on the feedback features 316 received, i.e. whether they are positive or negative, may adjust up or down the probabilities of associated nodes in at least one embodiment, similar to the process described for the system 100 and method 200. In other words, the adaptive motor decoder 312 may recalibrate its internal parameters for mapping a particular feature 315 to a particular control signal 317 in order to effect a desired action on the rehabilitation device 330. Accordingly, the adaptive motor decoder 312 is able to learn a mapping of the features to the user's intent from the feedback provided by the user 320. The adaptive motor decoder 312 continues to adjust, with user feedback, to learn the mapping of a plurality of features from the adaptive feature extractor 311 to the user's intent.

In addition, the ability to add additional features to the adaptive rehabilitation controller 310 enables unique rehabilitation regimens to be created, as they can be custom tailored to a user 321. For example, features that represent large changes in the input neural signal 321 could be selected in the initial part of the rehabilitation, and features that represent smaller changes in the neural signal 321 could be added later to refine the rehabilitation regimen. The adaptive rehabilitation regimen changes the cues and tasks to improve the user's rehabilitation and the learning rate of the adaptive rehabilitation controller 310. The adaptive rehabilitation regimen can adjust the difficulty task or change tasks, to keep the user learning by not making the task too frustrating or too boring. By adjusting the difficulty of the task, the adaptive rehabilitation regimen also manages the error rate in the feedback to the adaptive motor decoder 312, increasing the rate the adaptive rehabilitation controller 300 can learn the user's 320 intent .

Figure 22 offers a diagrammatic representation of another illustrative embodiment of a method 400 for responsive neurorehabilitation . In the embodiment of Figure 22, a user neural signal, or signal from the user's brain, is transmitted to an adaptive rehabilitation controller which comprises an adaptive feature extractor and an adaptive motor decoder, as in 401. Features are created in the adaptive feature extractor, as in 402.

The features may be predetermined, custom set, or may be created by a programmed routine. Next, at least one feature is transmitted to the adaptive motor decoder, based at least in part on the neural signal received, as in 403. The adaptive motor decoder then maps the at least one feature to a control signal corresponding to an action of a rehabilitation device, as in 404.

The control signal is transmitted to the rehabilitation in to perform an action, as in 405. The adaptive rehabilitation controller then obtains user sensory feedback for the action performed by the rehabilitation device, as in 406. The user sensory feedback may comprise positive or negative feedback. The adaptive motor decoder, as in 407, modifies its internal parameters based on the sensory feedback received. For instance, the decoder may adjust upwards the probability of a certain action upon a positive feedback, or adjust downwards for a negative feedback .

In at least one embodiment, the adaptive motor decoder comprises a neural network. This neural network may comprise a three layer feedforward fully connected network such as the neural decoder 150 illustrated in system 100. In other embodiments, other layers, a feedback, and/or partially connected networks may be used. In a preferred embodiment, negative feedback is weighed greater relative to positive feedback in the adaptive motor decoder .

In the method 400 and system 200 of the present invention, the ability of the rehabilitation controller to adapt to the user 320 and learn the user's intent engages the brain more than normal rehabilitation. By re-engaging the brain, the adaptive rehabilitation controller 310 strengthens the connection between the brain and the body. The loop of brain, adaptive rehabilitation controller, rehabilitation device, and body recreates the normal control loop in the body, thus increasing the speed and extent of the recovery.

EXAMPLES The adaptive neural decoding systems and methods for rehabilitation are further illustrated by the following examples, which are provided by way of illustration and are not intended to be limiting. It will be appreciated that variations in alternatives in elements of the system and method for responsive neurorehabilitation shown will be apparent to those skilled in the art and are within the scope of embodiments of the present invention. Theoretical aspects are presented with the understanding that Applicants do not seek to be bound by the theory presented. All parts or amounts, unless otherwise specified, are by weight.

The terms "brain-computer system (BCI system)," "brain- computer interface (BCI)" refer at least in part to the system 100 and decoder 150 described above in at least one embodiment. In these embodiments, the term "Actor" is synonymous to the network 102, and the term "Critic" is related to a feedback module 105 in the configuration of a similar neural network to network 102, i.e. a 3-layer fully connected feedforward neural network, unless otherwise stated. "Weights" of the network refer to synaptic weights 140. Neurons and synthetic neurons may be used interchangeably with nodes or processing units 101. a . Hand Movement Experiment

Since upper extremity function is a top priority for people living with SCI, one test performed focused on the ability to control hand grasp.

(i) Experimental Task

During the experimental task used for initial benchmarking the preliminary system, the user, one healthy, adult, male with no prior BCI experience, watches a display that shows cues to open or close their hand. This display also provides feedback updated after each cue on whether the BCI's decoding was correct using a bar plot to indicate the BCI's decoding of the user's brain activity, as shown in Figure 19A. The number of ^y+'s and ^y-'s on the screen show the unthresholded output of the motor potentials classifier. For each trial during the experimental task, the display as in Figure 19A showed a fixation cross for Is, followed by a cue for "open" or "close" for Is, and then feedback of "correct" or "wrong" for Is. In addition to the explicit feedback of "correct" or "wrong", a bar plot was presented that show the unthresholded output of the motor potentials decoder.

The experiment consisted of four sessions of 120 trials with a 5-minute break between each session. During the first session, a predetermined sequence of cues for "open" and "close" and feedback of "correct" and "wrong" were presented. The user received predetermined feedback of "wrong" 50% of the time to evoke error related potentials (ErrP) in the user's EEG. During the next three sessions, the display produced the decoding of the users modulated motor potentials. Cues of "open" and "close" were presented in a predetermined sequence. And, feedback was displayed, based on the BCI's classification of the user's motor potentials .

(ii) Brain-Computer Interface Architecture

When the user controlled the display, the input to the BCI was the user's motor EEG and the output was two possible actions, "open" or "close", Figure 19B. The BCI was updated using an actor-critic reinforcement learning (RL) algorithm. The actor- critic RL algorithm tries to optimize the functional mapping of the user's brain activity to the possible actions. The actor- critic RL method is a semi-supervised machine learning method in which the actor learns from the critic's feedback. The actor decodes motor potentials and outputs an action shown on the display. The critic detects ErrP and provides feedback to actor. The actor uses feedback from the critic to adapt to the user.

Both the actor and critic are 3-layer fully connected feedforward neural networks. The hidden and output nodes of the neural networks perform a weighted sum on their inputs. The weighted sum at each node is passed through a hyperbolic tangent function with an output in the range of -1 to 1. The weights between the actor's nodes are initialized randomly and then updated after each trial based on feedback. The critic provides the feedback by decoding the user's EEG to determine if they generated an ErrP. If an ErrP is detected feedback of -1 is provided to the network for adaptation. If not, a feedback value of 1 is used.

The actor's weights update can be expressed as:

-iw¾ = yf {xii f - X; ) ) 4- 1

Here w is the weight connecting nodes i and j, γ is the learning rate, p. is a sign function of output x_j (positive values become +1 and negative values become -1) and f is feedback from the critic. The weight update equation is based on Hebbian style learning [9, 10] . Improved classification performance by the actor in early trials was achieved by real time 'epoching' of the data [10] . After each trial, the actor was trained on the current trial and all previous trials .

(Hi) Data Acquisition

Neural signals were recorded with a 10-channel Advanced Brain Monitoring (ABM, Carlsbad) wireless EEG system (sample rate 256Hz, 16 bits of resolution) with electrodes in a 10-20 system arrangement. Motor potentials related to the intent to open or close the right hand were collected from the C3 electrode, l-50Hz. In addition to motor potentials, error-potentials (ErrP) were collected from the Cz electrode, 5-10Hz. EEG corresponding to motor potentials were low pass filtered at 60Hz. ErrPs were low pass filtered at 10Hz. Power spectral density (PSD) of 1 Hz resolution was then computed on the Is of filtered EEG data after cues were displayed for motor potentials and the Is of filtered EEG data after feedback was given for ErrPs. The PSD was normalized for each 1 Hz bin by subtracting the mean of all trials and dividing by the standard deviation of all trials.

EEG is commonly contaminated by artifacts originating from ocular muscle motion, which has a high amplitude relative to EEG signals. Ocular artifact such as eye blinks or saccadic motion are relatively simple to identify by visual examination of the neural signals and are characterized by short duration high amplitude wave most present across frontal electrodes (such as Fz, F3, & F4). To remove these artifacts from the signal, Independent component analysis was applied using the Infomax algorithm present in EEGlab. Independent Components due to artifacts from eye movement & blinking were identified by their frontal distribution in scalp topography, matching of component activity in the time domain to eye blink shape, & smoothly decreasing activity power spectrum. The artifactual components were then subtracted from the EEG and the remaining components were remixed to produce a cleaner signal.

(iv) Critic Error Potential Classifier

The error potential classifier, critic, detects ErrPs in the user's EEG to determine if the user thought an error occurred. The critic than provides binary feedback, -1 or 1, to the actor. The input to the error potential classifier was the PSD from 5-10Hz in 1 Hz bins computed on the Is of filtered EEG data after the actor's output, action, was shown on the display.

The error potential classifier in the critic is a 3-layer adaptive neural network with 5 inputs nodes and 5 hidden nodes, trained via backpropagation . The 120 trials of the first session were randomly assigned to either a training set or test set. The training set was used to optimize the weights of the critic. The weights produced from the training were assessed by passing the test set through the critic and computing its classification accuracy. The critic was trained and tested until the generalization increased above a threshold. The weights of the critic used in the best testing session were then saved and used for all subsequent experiments. b. Results

The performance of the EEG BCI based on reinforcement learning is summarized in three parts: 1) critic performance, 2) overall decoding accuracy over time, and 3) characterization of the actor in the early, middle, and late trials of the sessions. To test the training paradigm, a 10-fold cross-validation was performed. The above training procedure was repeated 10 times and the average classification accuracy was computed, as follows: ErrP No-ErrP

Classified as ErrP 69% 37%

Classified as No-ErrP 31% 63%

To test the performance of the BCI an offline simulation of 3500 trials was performed. The simulation provided a method to test several factors . A large number of trials could be used, which is more realistic for rehabilitation over several days. Additional processing and filtering could be done on the EEG data, which would require optimization to perform in real time. And, a large number of features could be used as input to the actor, which would also require optimization to perform in real time.

The simulation was performed by generating a random sequence from the 360 recorded trials. The motor potentials from the random trial were filtered and features created, PSD in lHz bins from 1- 50Hz. Individual frequencies did not show large differences in average power between the two classes (open and closed); however the classifier was able to learn discernible patterns across the 50 frequencies of l-50Hz. The actor classified the trial based on these features. If the actor's classification was correct for that trial, recorded EEG data from a trial that showed feedback of correct was presented to the critic. Similarly, if the actor's classification was incorrect for that trial, recorded EEG data from a trial that showed feedback of incorrect was presented to the critic. The output of the critic was given as feedback to the actor, so the actor's weights could be adapted with RL .

Figure 20A shows the cumulative classification accuracy, number of correct trials divided by the number of trials, of the BCI over the course of the simulation. The performance of the BCI increased rapidly over the first few hundred trials and continued to increase until the end of the 3500 trials simulation. After the first 1500 trials, the performance of the BCI showed a monotonic increase, indicating the BCI was converging on a solution and becoming more stable. To test for overfitting the dataset, the same algorithm was run on a surrogate dataset (randomized motor potentials); the end classification accuracy was 51%. Figure 20B shows a more detailed view of the performance of the BCI during the beginning, middle, and end of the simulation. The actor's performance, the weight values of the actor, and the output of the actor are shown. Again, the BCI performance can be seen to increase rapidly in the beginning of the simulation and become more stable later on in the simulation, while still increasing. The weights of the actor changed dramatically at the beginning of the simulation as the BCI adapts to the user and finds a solution. In the middle of the simulation the BCI is still adapting to the user, as seen in the changing weight values, but is converging on a solution and becoming more stable. At the end of the simulation, as the RL algorithm converged on a solution to mapping the motor potentials to the actions, the weights became stable. The actor's output showed a decrease in errors, red stems, as the simulation progressed. The errors were more likely to be single events and not clustered together, at the end of the simulation .

The simulation showed several results important for rehabilitation. The performance of the BCI increased rapidly in the first few hundred trials. To maintain the user's engagement, the performance of the BCI has to increase above chance quickly, so the user continues to be engaged in control of the device. The performance of the BCI also showed steady increases in later trials, also important for user's engagement. The mapping of motor potentials to actions also became stable in later trials, as seen in the weights values plots. This stability means the user will not see sudden decreases in performance in later trials unless there is a large remapping necessary.

In other embodiments, an EEG system with more electrodes may be used. The additional electrodes will increase spatial resolution within the motor cortex, which could increase the motor decoder accuracy and potentially increase the recognition of ErrPs . When the BCI is paired with rehabilitation (i.e. functional electrical stimulation controlling hand grasp), the subject will see actual physical movement, which should increase engagement in the task. The increased engagement could improve motor potential signal strength. We seek to monitor performance over several days to collect a large number of trials and test how the BCI handles extended breaks between sessions. The extended breaks could lead to more dramatic changes in the user's motor potentials; the adaptive BCI used here is well-suited for this kind of application .

Since many modifications, variations and changes in detail can be made to the described preferred embodiment of the invention, it is intended that all matters in the foregoing description and shown in the accompanying drawings be interpreted as illustrative and not in a limiting sense. Thus, the scope of the invention should be determined by the appended claims and their legal equivalents . REFERENCES

ADAM, S., BUSONIU, L. & BABUSKA, R. 2012. Experience Replay for

Real-Time Reinforcement Learning Control. Systems, Man, and

Cybernetics, Part C: Applications and Reviews, IEEE

Transactions on, 42, 201-212.

BARTO, A. G. & ANANDAN, P. 1985. Pattern-recognizing stochastic learning automata. Systems, Man and Cybernetics, IEEE

Transactions on, SMC-15, 360-375.

ERDOGMUS, D., FONTENLA-ROMERO, 0., PRINCIPE, J. C, ALONSO-

BETANZOS, A. & CASTILLO, E. 2005. Linear-least-squares initialization of multilayer perceptrons through backpropagation of the desired response. Neural Networks,

IEEE Transactions on, 16, 325-337.

GILJA, V., NUYUJUKIAN, P., CHESTER, C. A., CUNNINGHAM, J. P., YU,

B. M., FAN, J. M., CHURCHLAND, M. M., KAUFMAN, M. T., KAO, J. C, RYU, S. I. & SHENOY, K. V. 2012. A high-performance neural prosthesis enabled by control algorithm design. Nat

Neurosci, 15, 1752-7.

HASSOUN, M. H. 1995. Fundamentals of Artificial Neural Networks,

Cambridge MA, MIT Press. HATSOPOULOS, N. G. & DONOGHUE, J. P. 2009. The Science of Neural Interface Systems. Annual Review of Neuroscience, 32, 249-266.

IZHIKEVICH, E. M. 2003. Simple model of spiking neurons. IEEE

Transactions on Neural Networks, 14, 1569-1572.

MAZZONI, P., ANDERSEN, R. & JORDAN, M. 1991. A More Biologically Plausible Learning Rule for Neural Networks. Proceedings of the National Academy of Sciences of the United States of America, 88, 4433-4437.

NARENDRA, K. S. & THATHACHAR, M. A. L. 1989. Learning automata: an introduction, Prentice- Hall, Inc.

PENNARTZ, C. M. A. 1997. Reinforcement learning by Hebbian synapses with adaptive thresholds. Neuroscience, 81, 303-319. PETERS, J. & SCHAAL, S. 2008. Natural Actor-Critic.

Neurocomputing, 71, 1180-1190.

SANCHEZ, J. C, CARMENA, J. M., LEBEDEV, M. A., NICOLELIS, M. A., HARRIS, J. G. & PRINCIPE, J. C. 2004. Ascertaining the importance of neurons to develop better brain-machine interfaces. IEEE Trans Biomed Eng, 51, 943-53.

SEHNKE, F., OSENDORFER, C, RUCKSTIESS, T., GRAVES, A., PETERS, J.

& SCHMIDHUBER, J. 2010. Parameter-exploring policy gradients. Neural Netw, 23, 551-9.

SUTTON, R. S., MCALLESTER, D., SINGH, S. & MANSOUR, Y. 2000.

Policy Gradient Methods for Reinforcement Learning with Function Approximation. Advances in Neural Information Processing Systems.

VASILAKI, E., FREMAUX, N., URBANCZIK, R., SENN, W. & GERSTNER, W.

2009. Spike-Based Reinforcement Learning in Continuous State and Action Space: When Policy Gradient Methods Fail. PLoS Comput Biol, 5, el000586.

WAWRZYNSKI, P. 2009. Real-time reinforcement learning by sequential Actor-Critics and experience replay. Neural

Networks, 22, 1484-1497.

WILLIAMS, R. J. 1988. Toward a theory of reinforcement-learning connectionist systems. Technical Report NU-CCS-88-3, Northeastern University, College of Computer Science.

WILLIAMS, R. J. 1992. Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning. Mach. Learn., 8, 229-256.

Now that the invention has been described,

Claims

1. An adaptive neural decoder comprising:

a plurality of interconnected processing units forming a network, said processing units comprising sensory nodes, hidden nodes, and output nodes,

a plurality of sensory nodes each configured to receive at least one neural input signal from the brain of a user, and output at least one sensory output signal

a plurality of hidden nodes each configured to individually receive at least one sensory output signal from said plurality of sensory nodes, through each of at least one synaptic connection, wherein each synaptic connection is associated with a corresponding synaptic weight,

each of said plurality of hidden nodes is further configured to calculate a probability based on the corresponding synaptic weight of each of its at least one synaptic connection to said plurality of sensory nodes, and output at least one hidden output signal based at least in part on the probability of the hidden node,

a plurality of output nodes configured to individually receive at least one hidden output signal from said plurality of hidden nodes, through each of at least one synaptic connection, wherein each synaptic connection is associated with a corresponding synaptic weight

each of said plurality of output nodes is further configured to calculate a probability based on the corresponding synaptic weight of each of its at least one synaptic connection to said plurality of hidden nodes, and generate an output signal,

wherein said neural decoder is configured to output the output signal corresponding to the winning node, defined as the output node having the highest probability, and

wherein said at least one hidden node and said at least one output node are further configured to receive at least one feedback signal and adjust at least one synaptic weight based on said feedback signal.

2. The neural decoder as recited in claim 1 wherein said feedback signal comprises positive feedback and negative feedback, wherein positive feedback is defined as a successful outcome as compared to the intended action of the user, and negative feedback is defined as an unsuccessful outcome as compared to the intended action of the user.

3. The neural decoder as recited in claim 2 wherein said feedback module is further configured to effect a greater change of at least one synaptic weight of a synaptic connection corresponding to the winning node for negative feedback, relative to positive feedback.

4. The neural decoder as recited in claim 1 wherein said feedback signal comprises user driven feedback.

5. The neural decoder as recited in claim 4 wherein said user driven feedback comprises brain signals from the user.

6. The neural decoder as recited in claim 4 wherein user driven feedback comprises motor feedback from the user.

7. The neural decoder as recited in claim 4 wherein user driven feedback comprises vocal feedback from the user.

8. The neural decoder as recited in claim 1 wherein said processing units generate discrete values.

9. The neural decoder as recited in claim 1 wherein each of said plurality of hidden node outputs a positive value if its probability is greater than zero, and a negative value if its probability is less than zero.

10. A system for adaptive neural decoding of a user comprising: a plurality of interconnected processing units forming a network comprising an input layer, a hidden layer, and an output layer, said processing units comprising sensory nodes, hidden nodes, and output nodes,

said input layer comprising a plurality of sensory nodes, each of said sensory nodes is configured to receive an input signal from a neural vector and to output a plurality of sensory output signals,

said hidden layer comprising a plurality of hidden nodes, each of said hidden nodes is configured to individually receive a sensory output signal from each of said plurality of sensory nodes through each of a plurality of synaptic connections, wherein each synaptic connection is associated with a corresponding synaptic weight,

each of said hidden nodes is further configured to calculate a probability based on the corresponding synaptic weights of each of its synaptic connections to each of said sensory nodes, and output a plurality of hidden output signals based at least in part on the probability of the hidden node,

said output layer comprising a plurality of output nodes, each of said output nodes is configured to individually receive a hidden output signal from each of said plurality of hidden nodes through each of a plurality of synaptic connections, wherein each synaptic connection is associated with a corresponding synaptic weight,

each of said output nodes is further configured to calculate a probability based on the corresponding synaptic weights of each of its synaptic connections to each of said hidden nodes,

wherein said network is configured to output a signal corresponding to the winning node to the environment, said winning node comprises the output node having the highest probability, and a feedback module configured to receive a feedback signal from the environment, said feedback module is further configured to effect the change of at least one synaptic weight of a synaptic connection corresponding to the winning node.

11. The system as recited in claim 10 wherein said feedback signal comprises positive feedback and negative feedback, wherein positive feedback is defined as a successful outcome as compared to the intended action of the user, and negative feedback is defined as an unsuccessful outcome as compared to the intended action of the user.

12. The system as recited in claim 11 wherein said feedback module is further configured to effect a greater change of at least one synaptic weight of a synaptic connection corresponding to the winning node for negative feedback, relative to positive feedback.

13. The system as recited in claim 10 wherein said feedback module is configured to receive user driven feedback.

14. The system as recited in claim 13 wherein said user driven feedback comprises brain signals from the user.

15. The system as recited in claim 13 wherein user driven feedback comprises motor feedback from the user.

16. The system as recited in claim 13 wherein user driven feedback comprises vocal feedback from the user.

17. The system as recited in claim 10 wherein said processing units generate discrete values .

18. The system as recited in claim 17 wherein each of the plurality of hidden nodes outputs a positive value if its probability is greater than zero, and a negative value if its probability is less than zero.

19. The system as recited in claim 10 further comprising at least one neuroprosthetics device structured to receive said output signal from said winning node and perform an action in the environment corresponding to said winning node.

20. A method for adaptive neural decoding, comprising:

receiving at least one neural input signal from the brain of a user through at least one sensory node,

transmitting at least one sensory output signal from the at least one sensory node to at least one hidden node,

receiving the at least one sensory output signal at the at least one hidden node through at least one synaptic connection, each synaptic connection being individually associated with a synaptic weight,

calculating a probability at each of at least one hidden node based at least in part on each synaptic weight of its synaptic connection ( s ) ,

transmitting at least one hidden output signal from the at least one hidden node to at least one output node, the hidden output signal of each hidden node defined at least in part by its probability,

receiving the at least one hidden output signal at the at least one output node through at least one synaptic connection, each synaptic connection being individually associated with a synaptic weight,

calculating a probability at each of at least one output node based at least in part on the synaptic weight of its synaptic connection ( s ) ,

transmitting an output signal corresponding to the winning node to the environment, the winning node is defined as the output node having the highest probability,

receiving at least one feedback signal from the environment, and

adjusting at least one synaptic weight of a synaptic connection based on the feedback signal.

21. The method as recited in claim 20 wherein the feedback signal comprises positive feedback and negative feedback, wherein positive feedback is defined as a successful outcome as compared to the intended action of the user, and negative feedback is defined as an unsuccessful outcome as compared to the intended action of the user.

22. The method as recited in claim 21 defining the adjusting of at least one synaptic weight to adjust for a greater change in the at least one synaptic weight for negative feedback, relative to positive feedback.

23. The method as recited in claim 20 wherein the feedback signal comprises user driven feedback.

24. The method as recited in claim 23 wherein the user driven feedback comprises brain signals from the user.

25. The method as recited in claim 23 the user driven feedback comprises motor feedback from the user.

26. The method as recited in claim 23 the user driven feedback comprises vocal feedback from the user.

27. The method as recited in claim 20 wherein the sensory nodes, hidden nodes and output nodes generate discrete values.

28. The method as recited in claim 27 wherein each of the at least one hidden nodes outputs a positive value if its probability is greater than zero, and a negative value if its probability is less than zero.