

# Online Learning Extreme Learning Machine with Low-Complexity Predictive Plasticity Rule and FPGA Implementation

Zhenya Zang, Xingda Li, and David Day Uei Li

**Abstract**—We propose a simplified, biologically inspired predictive local learning rule that eliminates the need for global backpropagation in conventional neural networks and membrane integration in event-based training. Weight updates are triggered only on prediction errors and are performed using sparse, binary-driven vector additions. We integrate this rule into an extreme learning machine (ELM), replacing the conventional computationally intensive matrix inversion. Compared to standard ELM, our approach reduces the complexity of the training from  $\mathcal{O}(M^3)$  to  $\mathcal{O}(M)$ , in terms of  $M$  nodes in the hidden layer, while maintaining comparable accuracy (within 3.6% and 2.0% degradation on training and test datasets, respectively). We demonstrate an FPGA implementation and compare it with existing studies, showing significant reductions in computational and memory requirements. This design demonstrates strong potential for energy-efficient online learning on low-cost edge devices.

**Index Terms**—Bio-inspired learning, Online learning, Reconfigurable hardware

## I. INTRODUCTION

ONLINE learning is essential in modern machine learning (ML) systems that operate under nonstationary data distributions, dynamic environments, or real-time constraints. Although modern deep learning methods have achieved state-of-the-art performance in various domains, they are primarily based on gradient-based optimisation and global error back-propagation [1], [2]. Although effective in batch learning, these approaches are inherently inefficient and ill-suited for real-time online adaptation. Alternative paradigms have emerged to address these challenges. Neuromorphic computing [3], for example, emulates the neural activities of the brain using event-driven parallel hardware to achieve highly efficient computation, making it ideal for edge AI and real-time processing. Representative digital neuromorphic implementations have demonstrated high scalability [4], enabling biologically inspired cognitive architectures [5], cerebellar network models [6], and fault-tolerant context-dependent learning frameworks [7]. Representative training rules, such as spike-timing-dependent plasticity (STDP) [8] and predictive learning rules (PLR) [9] enable local learning at the synapse level. These rules assign credit based

This work is supported by the EPSRC (EP/T00097X/1), the Quantum Technology Hub in Quantum Imaging (Quantic), and the University of Strathclyde. Xingda Li also acknowledges support from the China Scholarship Council. The authors would also like to acknowledge the support from Xilinx for donating the FPGA.

Zhenya Zang, Xingda Li, and David Day Uei Li are with the Department of Biomedical Engineering, University of Strathclyde, Glasgow, UK. (e-mail: zhenya.zang@strath.ac.uk, xingda.li@strath.ac.uk, david.li@strath.ac.uk).

on temporal correlations or predictive errors, aligning well with streaming data and hardware-locality constraints. Another promising direction involves backpropagation-free methods that eliminate computationally intensive training, such as Extreme Learning Machine (ELM) [10], [11], and Random Vector Functional Link [12]. ELM, featuring fast training and a simple architecture, provides an efficient alternative to conventional neural networks by projecting input data through a fixed random layer and solving output weights via a closed-form least-squares solution, thereby avoiding iterative gradient updates. However, standard ELMs assume full access to training data and require costly matrix inversion, limiting their suitability for real-time and low-power inference. To overcome these issues, several online ELM variants [11], [13], [14] have been proposed, incorporating recursive least-squares updates and forgetting mechanisms for incremental learning.

In this work, we propose a compact learning architecture, simplified PLR-ELM (SPLR-ELM), which combines the fast inference capability of ELMs with the low complexity of PLR. The proposed method replaces the global least-squares solution with a local winner-take-all (WTA) predictive update based on an error-driven learning rule, eliminating the need for eligibility traces, membrane potential integration, and global backpropagation. This design enables an efficient online update path that relies only on binary vector additions. We demonstrate a fixed-point (FXP) FPGA implementation using binary hidden activations and multiplier-free update logic. Unlike conventional ELMs or biologically inspired models, SPLR-ELM provides a scalable on-device learning solution for edge scenarios constrained by energy, memory, and latency. This is the first work to unify ELM and simplified PLR within an efficient online learning framework and validate its practicality through hardware realisation, integrating biologically inspired plasticity with efficient hardware design for real-time ML systems.

## II. THEORIES

### A. Closed-Form Solution of ELM

In ELM [10], input-to-hidden weights are randomly initialised and fixed, while only hidden-to-output weights are learned. For an input  $\mathbf{x} \in \mathbb{R}^D$ , the hidden activation is  $\mathbf{h} = \sigma(\mathbf{W}_{\text{in}}\mathbf{x} + \mathbf{b})$ , where  $\sigma(\cdot)$  is a nonlinear function, and  $\mathbf{W}_{\text{in}} \in \mathbb{R}^{M \times D}$  and  $\mathbf{b} \in \mathbb{R}^M$  are random and fixed. Given  $N$  samples, hidden responses form  $\mathbf{H} \in \mathbb{R}^{N \times M}$  and targets

TABLE I  
CONCEPTUAL AND METHODOLOGICAL COMPARISON OF TRAINING FROM ELM, PLR, AND THE PROPOSED SPLR

| Feature                  | Standard ELM [10]                                                                                             | Original PLR [9]                                                   | Proposed SPLR                         |
|--------------------------|---------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------|---------------------------------------|
| Loss Function            | Minimises a global least-squares                                                                              | Minimises a predictive loss                                        | Minimises a WTA loss                  |
| Learning Rule            | $\mathbf{W}^* = \mathbf{H}^\dagger \mathbf{T} = (\mathbf{H}^\top \mathbf{H})^{-1} \mathbf{H}^\top \mathbf{T}$ | $w_t = w_{t-1} + \eta(\epsilon_t v_{t-1} + \varepsilon_t p_{t-1})$ | $w = w + \eta h; w = w - \eta h$      |
| Learning Scale           | Global                                                                                                        | Neuron-wise, temporal & error driven                               | Neuron-wise & error driven            |
| Trace                    | N.A.                                                                                                          | Eligibility traces                                                 | Removed for simplicity                |
| Computational Complexity | $\mathcal{O}(M^3)$ (matrix inversion)                                                                         | $\mathcal{O}(M)$ per time step                                     | $\mathcal{O}(M)$ when misclassified   |
| Hardware Suitability     | Low, high complexity                                                                                          | Low, primarily for spiking systems                                 | High, only additions in weight update |

$\mathbf{T} \in \mathbb{R}^{N \times C}$ ; trainable weights  $\mathbf{W} \in \mathbb{R}^{M \times C}$  are then estimated by minimizing the least-squares error.

$$\min_{\mathbf{W}} \|\mathbf{HW} - \mathbf{T}\|_2^2. \quad (1)$$

The learning rule of ELM is summarised in Table I. Several online ELM variants [11], [13], [14] adopt incremental updates to reduce computational cost and improve on-chip efficiency. However, they still require an initial batch training phase involving the matrix inversion  $(\mathbf{H}_0^\top \mathbf{H}_0)^{-1}$ , where  $\mathbf{H}_0 \in \mathbb{R}^{N_0 \times M}$  and  $N_0$  denotes the number of samples in the initial batch. Neuromorphic methods, in contrast, avoid large-scale matrix operations during training and inference with minimal accuracy loss. Following this principle, the proposed SPLR-ELM replaces global learning with local error-driven predictive updates without spike-train encoding, effectively balancing accuracy and computational cost for real-time, low-power deployment.

### B. Simplified Predictive Learning Rule

PLR [9] is a biologically inspired local learning rule that enables single neurons to predict future events by capturing the temporal structure of synaptic inputs. It strengthens synapses that best anticipate others, giving rise to sequence learning, anticipatory spiking, and STDP-like behavior. PLR has been effectively used to train spiking neural networks (SNNs) and has been implemented on FPGA [15], demonstrating both the learning efficiency and hardware suitability. The original weight updated rule is presented in Table I, where  $\eta$  is the learning rate (LR),  $\epsilon_t = x_t - v_{t-1}w_{t-1}$  denotes the prediction error,  $\varepsilon_t = \epsilon_t w_{t-1}$  represents the global modulatory signal, and  $p_t = p_{t-1}e^{-\frac{t}{\tau_m}} + x_t$  is the eligibility trace. This formulation allows individual neurons in an SNN to learn long-timescale temporal dependencies and shift their responses toward predictive inputs, aligning with experimental observations of STDP. The weight update rule requires continuous tracking of synaptic state variables across time, such as eligibility traces  $p_t$  and membrane potentials  $v_t$ , involving floating-point (FLP) multiplications. While biologically plausible, such complexity introduces substantial computational overhead, particularly on hardware-constrained platforms.

To address this challenge, our proposed SPLR replaces continuous dynamics with a discrete, error-driven mechanism: HL outputs are one-timestamp, binary spikes  $\mathbf{h} \in \{0, 1\}^M$ ,

and output predictions are made through a linear readout  $\mathbf{o} = \mathbf{h}^\top \mathbf{W}$ , where

$$\mathbf{h} = \Theta(\mathbf{W}_{\text{in}} \mathbf{x} + \mathbf{b}) \in \{0, 1\}^M. \quad (2)$$

Unlike prior work [15], which uses leaky integrate-and-fire (LIF) models in the HL, our SPLR-ELM receives normalized real-valued input  $\mathbf{x} \in \mathbb{R}^D$  and projects it to binary hidden states using a Heaviside step function activation  $\Theta(\cdot)$  with a fixed threshold. This threshold can be replaced by zero-centering the projection  $\mathbf{W}_{\text{in}} \mathbf{x} + \mathbf{b}$ , but with a  $\sim 3\%$  drop in classification accuracy. The update rule shown in Table III is applied only when the prediction is incorrect.  $\mathbf{W}$  is adjusted by reinforcing the correct output and suppressing the incorrect one. This rule eliminates the need for eligibility traces, membrane potential integration, and FLP multiplications by integer-based vector additions or subtractions. Weight values are clipped to a bounded range to avoid divergence:

$$w \leftarrow \min(\max(w, -w_{\text{max}}), w_{\text{max}}). \quad (3)$$

To prove the feasibility of the conventional gradient-based training perspective, SPLR-ELM's weight update can be interpreted as the gradient of a loss function based on WTA behavior

$$\mathcal{L} = \frac{1}{2}(o_{\hat{y}} - o_y)^2. \quad (4)$$

The gradients of the loss function with respect to the output weights are

$$\frac{\partial \mathcal{L}}{\partial W_{i\hat{y}}} = \frac{\partial \mathcal{L}}{\partial o_{\hat{y}}} \cdot \frac{\partial o_{\hat{y}}}{\partial W_{i\hat{y}}} = (o_{\hat{y}} - o_y) \cdot h_i, \quad (5)$$

$$\frac{\partial \mathcal{L}}{\partial W_{iy}} = \frac{\partial \mathcal{L}}{\partial o_y} \cdot \frac{\partial o_y}{\partial W_{iy}} = -(o_{\hat{y}} - o_y) \cdot h_i, \quad (6)$$

which are hardware-friendly, since  $h_i$  is binary and the updates involve only addition or subtraction operations. Using a fixed LR  $\eta$  and assuming hard WTA behaviour, the update rule can be discretised to

$$\Delta W_{iy} = +\eta h_i, \quad \Delta W_{i\hat{y}} = -\eta h_i \quad \text{if } \hat{y} \neq y, \quad (7)$$

which is equivalent to the rule shown in Table I. Compared with the intensive one-time matrix inversion in ELM, which requires  $\mathcal{O}(M^3)$  operations and is typically executed offline, SPLR-ELM performs neuron-wise online updates with  $\mathcal{O}(M)$  operations per misclassified sample. This replaces the costly inversion with lightweight local updates, preserving the predictive essence of PLR while eliminating the need for continuous



Fig. 1. Comparison of training and test accuracy with different learning paradigms on the MNIST and Fashion-MNIST datasets.

dynamics, eligibility traces, and global modulatory signals. The resulting rule is scalable and hardware-efficient, well-suited for real-time and FPGA-based edge learning.

### III. ACCURACY EVALUATION

MNIST [16] and Fashion-MNIST [17] datasets are employed to evaluate the accuracy of SPLR-ELM, enabling fair comparison with existing learning rules on identical benchmarks. A subset of 5,000 training and 1,000 test samples (10% of the total dataset) is used, yet the model converges stably, demonstrating efficient learning from limited data. Conventional ELM [10] and OS-ELM [11] are included for comparison regarding computational efficiency and accuracy. To ensure fairness, both ELM and OS-ELM employ the *sigmoid* activation and its regularised variant to maintain accuracy and numerical stability. Fig. 1 compares the performance across four representative ELM training paradigms—matrix inversion-based ELM [10], online sequential ELM [11], STDP-driven ELM [18], and the proposed SPLR-based training—evaluated under 32-bit floating-point (FP32) and 16-bit fixed-point (FPX16, 8 integer + 8 fractional bits) formats. The network architecture is kept consistent, with 784 input nodes, 2048 hidden nodes, and 10 output nodes, and reported results represent average accuracies over training and test datasets. The MNIST, augmented with Gaussian noise to emulate real-world cases, and skewed F-MNIST datasets are introduced to evaluate robustness under non-ideal conditions. The manipulated imbalanced dataset follows a long-tailed class distribution, where the sample count gradually decreases from 400 in the majority class (Class-0) to 200 in the minority class (Class-9), representing a 2x imbalance ratio across classes. An additional experiment using a real-world ten-gesture photonic sensor dataset [19] is included to validate the algorithm’s applicability, featuring 28×28 images with blurrier boundaries than MNIST and F-MNIST. Results on the skewed-F-MNIST datasets present slight accuracy degradation, yet close to an ideal balanced scenario.

Although the latest STDP method (R-STDP) [18], which shows a similar ELM model architecture but with a different weights updating method, reports higher accuracy in its original paper, its performance degrades in our evaluation due to the use of real-valued pixel inputs, as opposed to the encoded spike trains presented in [18]. SPLR is an error-driven alternative to



Fig. 2. Overview of the proposed SPLR-ELM hardware architecture.

spike-based R-STDP. While R-STDP adjusts synaptic weights based on spike timing and reward-gated triplet interactions, SPLR relies on the discrepancy between predicted and target outputs, making it computationally and hardware efficient. The SPLR omits spike-timing traces, but preserves the reward-modulated weight adaptation principle. Weight update requires four learning rate hyperparameters to capture short-term and long-term spike-timing effects, making training more stable. Whereas SPLR uses only a learning rate. From a hardware perspective, each hidden node in SPLR only has one comparator and two adders, achieving better scalability than R-STDP, which needs two multipliers. SPLR achieves accuracy comparable to ELM and OS-ELM, while significantly reducing computational complexity. OS-ELM presents slightly lower training accuracy, possibly due to the number of samples in the initial training session being a hyperparameter. The model may not generalise well if the initial batch is small or imbalanced. Although fine-tuning this hyperparameter and diversifying the dataset could improve accuracy, these aspects are beyond the scope of this work. In contrast, SPLR avoids this issue entirely, as its per-sample, prediction error-driven updates require no initial training phase. Even in the FPX16 implementation, the drop in accuracy is slight, further supporting its suitability for hardware-efficient implementations.

### IV. HARDWARE IMPLEMENTATION

SPLR-ELM is implemented in Verilog using Vivado 2018.3. ZCU104 Ultrascale+ MPSoC, which offers moderate hardware resources, is selected as the target device to demonstrate the feasibility of programmable edge devices. Each HL neuron on the FPGA includes a pseudo-random number generator (PRNG) to generate its input-to-hidden weights at runtime, eliminating the need to preload weights into BRAM. A fixed seed is used during training and inference, and the PRNGs are identically reseeded at the arrival of each new sample, ensuring deterministic and consistent weight regeneration across training and inference. This approach significantly reduces BRAM usage by avoiding storage of static weight matrices. Each HL neuron independently generates a unique pseudo-random weight vector of length  $D$  (input dimension) for each training or inference sample.

The top-level hardware architecture, shown in Fig. 2, consists of modules for multiplication and accumulation (MAC), binary



Fig. 3. Detailed diagrams of (a) PRNG, MAC, COMP, PISO in a single HN and (B) WU in a single ON.



Fig. 4. Timing diagram of key modules, including PRNG, MAC, COMP, ITP, and WU.

activation comparison (COMP) as defined in Eq. 2, a PRNG, in-training prediction (ITP) for computing  $\mathbf{o} = \mathbf{h}^\top \mathbf{W}$ , and weight update (WU) logic implementing Eq. 7. The hidden (HL) and output (OL) layers instantiate multiple neurons for parallel computation. Once  $\mathbf{h}$  and  $\mathbf{o}$  are generated, parallel-in-serial-out (PISO) modules aggregate their outputs from the HL and OL, respectively. Multiplexers in the OL toggle between training and inference modes through the ITP module—updating weights during training or outputting  $\mathbf{o}$  during inference. Fig. 3(a) presents the detailed HN structure. The PRNG employs a linear feedback shift register (LFSR) with XOR-based combinational logic to generate pseudo-random weights for the multiplier, incorporating runtime underflow and overflow control. A multiplexer selects between accumulated products or bias summation, followed by threshold comparison to produce binary outputs stored in PISO registers corresponding to neuron indices. The WU module, illustrated in Fig. 3(b), updates weights stored in BRAM. If the ITP output differs from the ground truth (GT) label, as defined in Eq. 7, the two ONs indexed by  $y$  and  $\hat{y}$  respectively add and subtract the learning rate (LR) from their stored weights over  $M$  clock cycles (CCs).

Fig. 4 presents the timing diagram of a single HN and ON neuron operation, including PRNG and MAC. This neuron is instantiated  $M$  times to enable parallel processing of serial input data. After  $D$  CCs (784 for the datasets used),  $M$  binary

TABLE II  
ACCURACY AND FPGA RESOURCE UTILISATION FOR DIFFERENT MODEL SIZES

| #HN               | Acc. (M) <sup>a</sup> | Acc. (F) <sup>a</sup> | LUT (230,400) | DFF (460,800) | DSP (1,728) | BRAM (312) | Freq. (MHz) | Power (W) |
|-------------------|-----------------------|-----------------------|---------------|---------------|-------------|------------|-------------|-----------|
| 512               | 81.3%                 | 71.2%                 | 62,864        | 48,543        | 512         | 5          | 230.7       | 1.24      |
| 1024              | 84.1%                 | 77.2%                 | 124,690       | 96,162        | 1,024       | 5          | 225.4       | 2.51      |
| 1700 <sup>b</sup> | 86.4%                 | 79.3%                 | 205,258       | 158,049       | 1,700       | 5          | 224.0       | 3.12      |

<sup>a</sup> M = MNIST, F = Fashion-MNIST.

<sup>b</sup> Resource-constrained to fit within ZCU104 FPGA fabric.

activations  $\mathbf{h}$  are obtained. An additional  $M$  CCs are needed to compute the linear output via ITP. If the predicted class  $\hat{y}$  differs from the target  $y$ , the ONs at the indices  $\hat{y}$  and  $y$  are updated in the  $M$  CC according to Eq. 7, using the BRAM access and arithmetic units. Thus, the worst-case total number of CCs required to process one training sample is  $T_{cc_t} = D + M + M + P$ , and for one inference sample is  $T_{cc_i} = D + M + P$ , where  $P$  is the number of cycles for internal data pipelining (=3 in our case). In the worst case, the second  $M$  in  $T_{cc_t}$  indicates that all predictions are incorrect. Therefore, the frame-per-second (FPS) rate of the pipelined architecture can be calculated as  $\text{FPS} = \frac{f_{\max}}{T_{cc}}$ , where  $f_{\max}$  is the maximum clock frequency calculated from timing closure.

The training frame rates (FPS) for the three model sizes in Table II are 199.7k, 195.1k, and 63.5k, respectively. Compared with prior works summarised in Table III, the proposed design achieves substantially higher training and inference throughput, albeit with increased hardware utilisation. The SPLR implementation achieves high computational efficiency through massive neuron-level parallelism and the elimination of multipliers and nonlinear operations, thereby reducing hardware usage and toggle rate and enabling higher throughput per watt. Unlike previous implementations that process encoded spike trains, SPLR-ELM directly handles real-valued inputs, fully leveraging the simplicity of its learning rule to achieve high-speed operation. Table II also summarises the hardware performance and classification accuracy of SPLR-ELM-FXP16. Power estimates are obtained using the Xilinx Power Estimator, and the maximum operating frequency is extracted from post-implementation timing analysis. Power consumption can be further optimised through time-division multiplexing, lower-bit fixed-point representations for less accuracy-sensitive applications, and clock gating or dynamic adjustment for varying workloads. It is noticeable that SPLR's accuracy on MNIST is lower than existing works, primarily because we did not adopt time-resolved synaptic state variables during training but used a single-timestamp discretised spike, causing some accuracy loss. Moreover, ELM typically achieves 92–95% accuracy on MNIST with 2,000–10,000 neurons, while our model reached 86.4% with 1,700 neurons—only about 5.6% lower than the 2,000-neuron case, which is acceptable given the trade-off.

## V. CONCLUSION

The proposed simplified predictive local rule integrated with ELM provides a compact training strategy that supports on-device learning and inference for image classification. The

TABLE III  
PERFORMANCE COMPARISON OF HARDWARE IMPLEMENTATIONS BASED ON SYNAPTIC PLASTICITY

| Work           | Platform       | Operating freq. (MHz) | Learn. speed (fps) | Infer. speed (fps) | On-chip learning | Power efficiency (fps/W) | Learning rule   | Dataset        | Network architecture | Accuracy (%)      |
|----------------|----------------|-----------------------|--------------------|--------------------|------------------|--------------------------|-----------------|----------------|----------------------|-------------------|
| 2015 [20]      | CPU            | N/A                   | N/A                | N/A                | N/A              | N/A                      | Pair-based STDP | MNIST          | 784-100              | 82.9              |
| 2021 [21]      | CPU            | N/A                   | N/A                | N/A                | N/A              | N/A                      | Tempotron       | MNIST, F-MNIST | 784-100-100          | 91.22, 77.34      |
| 2014 [22]      | Spartan-6 FPGA | 75                    | N/A                | 1.89               | No               | 1.26                     | Persistent CD   | MNIST          | 784-500-500-10       | 92.0              |
| 2017 [23]      | Virtex-6 FPGA  | 120                   | 0.06               | 0.12               | Yes              | N/A                      | Pair-based STDP | MNIST          | 784-800              | 89.1              |
| 2017 [24]      | Spartan-6 FPGA | 25                    | N/A                | 6.25               | No               | 52.08                    | Persistent CD   | MNIST          | 784-500-500-10       | 93.8              |
| 2018 [25]      | Spartan-6 FPGA | 100                   | N/A                | N/A                | Yes              | N/A                      | Stochastic-STDP | MNIST          | 784-6400-10          | 95.7              |
| 2021 [26]      | Virtex-7 FPGA  | 100                   | 61                 | 317                | Yes              | 196.89                   | Pair-based STDP | MNIST          | 784-200-100-10       | 92.93             |
| 2022 [18]      | ZC706          | 200                   | 22.5               | 30                 | Yes              | N/A                      | Triplet R-STDP  | MNIST, F-MNIST | 784-2,048-100        | 93.0, 78.5        |
| 2025 This work | ZCU104         | <b>224.0</b>          | <b>63,454</b>      | <b>122,336</b>     | <b>Yes</b>       | <b>39,210.26</b>         | <b>SPLR-ELM</b> | MNIST, F-MNIST | <b>784-1,700-10</b>  | 86.4, <b>79.3</b> |

proposed architecture is implemented on an Xilinx ZCU104 FPGA. Experimental results validate the SPLR-ELM's low latency and competitive accuracy potential on MNIST and Fashion-MNIST datasets. This makes it well-suited for low-complexity classification tasks, particularly on edge devices. Future improvements may focus on extending the algorithm to handle more challenging classification tasks, such as those with low signal-to-noise ratios or high-dimensional data. Additionally, a 2D convolutional SPLR-ELM variant could be explored for enhanced performance in image processing applications.

## REFERENCES

- [1] Y. LeCun, Y. Bengio, and G. Hinton, "Deep learning," *Nature*, vol. 521, no. 7553, pp. 436–444, 2015.
- [2] D. E. Rumelhart, G. E. Hinton, and R. J. Williams, "Learning representations by back-propagating errors," *Nature*, vol. 323, no. 6088, pp. 533–536, 1986.
- [3] K. Roy, A. Jaiswal, and P. Panda, "Towards spike-based machine intelligence with neuromorphic computing," *Nature*, vol. 575, no. 7784, pp. 607–617, 2019.
- [4] S. Yang, B. Deng, J. Wang, H. Li, M. Lu, Y. Che, X. Wei, and K. A. Loparo, "Scalable digital neuromorphic architecture for large-scale biophysically meaningful neural network with multi-compartment neurons," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 31, no. 1, pp. 148–162, 2020.
- [5] S. Yang, J. Wang, X. Hao, H. Li, X. Wei, B. Deng, and K. A. Loparo, "Bicoss: Toward large-scale cognition brain with multigranular neuromorphic architecture," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 33, no. 7, pp. 2801–2815, 2022.
- [6] S. Yang, J. Wang, N. Zhang, B. Deng, Y. Pang, and M. R. Azghadi, "Cerebellumorphic: Large-scale neuromorphic model and architecture for supervised motor learning," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 33, no. 9, pp. 4398–4412, 2022.
- [7] S. Yang, J. Wang, B. Deng, M. R. Azghadi, and B. Linares-Barranco, "Neuromorphic context-dependent learning framework with fault-tolerant spike routing," *IEEE Transactions on Neural Networks and Learning Systems*, vol. 33, no. 12, pp. 7126–7140, 2022.
- [8] N. Caporale and Y. Dan, "Spike timing-dependent plasticity: a hebbian learning rule," *Annu. Rev. Neurosci.*, vol. 31, no. 1, pp. 25–46, 2008.
- [9] M. Saponati and M. Vinck, "Sequence anticipation and spike-timing-dependent plasticity emerge from a predictive learning rule," *Nature Communications*, vol. 14, no. 1, p. 4985, 2023.
- [10] G.-B. Huang, Q.-Y. Zhu, and C.-K. Siew, "Extreme learning machine: theory and applications," *Neurocomputing*, vol. 70, no. 1–3, pp. 489–501, 2006.
- [11] Y. Lan, Y. C. Soh, and G.-B. Huang, "Ensemble of online sequential extreme learning machine," *Neurocomputing*, vol. 72, no. 13–15, pp. 3391–3395, 2009.
- [12] A. K. Malik, R. Gao, M. Ganaie, M. Tanveer, and P. N. Suganthan, "Random vector functional link network: Recent developments, applications, and future directions," *Applied Soft Computing*, vol. 143, p. 110377, 2023.
- [13] J. Zhao, Z. Wang, and D. S. Park, "Online sequential extreme learning machine with forgetting mechanism," *Neurocomputing*, vol. 87, pp. 79–89, 2012.
- [14] M. Tsukada, M. Kondo, and H. Matsutani, "A neural network-based on-device learning anomaly detector for edge devices," *IEEE Transactions on Computers*, vol. 69, no. 7, pp. 1027–1044, 2020.
- [15] W. Liu, S. Xiao, Y. Liu, and Z. Yu, "Sc-plr: An approximate spiking neural network accelerator with on-chip predictive learning rule," *IEEE Transactions on Biomedical Circuits and Systems*, vol. 18, no. 5, pp. 1156–1165, 2024.
- [16] Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner, "Gradient-based learning applied to document recognition," *Proceedings of the IEEE*, vol. 86, no. 11, pp. 2278–2324, 1998.
- [17] H. Xiao, K. Rasul, and R. Vollgraf, "Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms," *arXiv preprint arXiv:1708.07747*, 2017.
- [18] Z. He, C. Shi, T. Wang, Y. Wang, M. Tian, X. Zhou, P. Li, L. Liu, N. Wu, and G. Luo, "A low-cost fpga implementation of spiking extreme learning machine with on-chip reward-modulated stdp learning," *IEEE Transactions on Circuits and Systems II: Express Briefs*, vol. 69, no. 3, pp. 1657–1661, 2021.
- [19] Z. Zang, X. Li, and D. D. U. Li, "Spiking neural network enhanced hand gesture recognition using low-cost single-photon avalanche diode array," *arXiv preprint arXiv:2402.05441*, 2024.
- [20] P. U. Diehl and M. Cook, "Unsupervised learning of digit recognition using spike-timing-dependent plasticity," *Frontiers in computational neuroscience*, vol. 9, p. 99, 2015.
- [21] T. Wang, C. Shi, X. Zhou, Y. Lin, J. He, P. Gan, P. Li, Y. Wang, L. Liu, N. Wu, et al., "Compsnn: A lightweight spiking neural network based on spatiotemporally compressive spike features," *Neurocomputing*, vol. 425, pp. 96–106, 2021.
- [22] D. Neil and S.-C. Liu, "Minitaur, an event-driven fpga-based spiking network accelerator," *IEEE transactions on very large scale integration (VLSI) systems*, vol. 22, no. 12, pp. 2621–2628, 2014.
- [23] Q. Wang, Y. Li, B. Shao, S. Dey, and P. Li, "Energy efficient parallel neuromorphic architectures with approximate arithmetic on fpga," *Neurocomputing*, vol. 221, pp. 146–158, 2017.
- [24] D. Ma, J. Shen, Z. Gu, M. Zhang, X. Zhu, X. Xu, Q. Xu, Y. Shen, and G. Pan, "Darwin: A neuromorphic hardware co-processor based on spiking neural networks," *Journal of systems architecture*, vol. 77, pp. 43–51, 2017.
- [25] A. Yousefzadeh, E. Stamatias, M. Soto, T. Serrano-Gotarredona, and B. Linares-Barranco, "On practical issues for stochastic stdp hardware with 1-bit synaptic weights," *Frontiers in neuroscience*, vol. 12, p. 665, 2018.
- [26] S. Li, Z. Zhang, R. Mao, J. Xiao, L. Chang, and J. Zhou, "A fast and energy-efficient snn processor with adaptive clock/event-driven computation scheme and online learning," *IEEE Transactions on Circuits and Systems I: Regular Papers*, vol. 68, no. 4, pp. 1543–1552, 2021.