# Ultra-Fast and Energy-Efficient Channel Estimation for Massive MIMO-OFDM Systems with Memristor Crossbar Based In-Memory Computing

Yi-Hang Ren, Shaoshi Yang, Senior Member, IEEE, Zi-Hao Xiong, Yu-Xin Zhang, Jia-Hui Bi, and Sheng Chen, Life Fellow, IEEE

Abstract-Massive multi-input multi-output (MIMO) signal processing algorithms heavily rely on high-dimension matrix operations, which impose excessively high computational complexity. Moreover, in the post-Moore era, the performance of the classical von Neumann computing architecture is facing severe limitations. The in-memory computing (IMC) technique holds the potential to break the memory wall and enhance the circuit's energy efficiency. In this paper, we present an memristor crossbar based IMC circuit design for performing the classical least square (LS) channel estimation with high computation parallelism. Simulation results demonstrate that even when considering the writing and reading errors, the mean square error (MSE) of the proposed circuit with 7-bit memristor is almost the same as that achieved by the digital computer. Moreover, the proposed circuit achieves the same level of computing performance as the NVIDIA RTX 6000 Ada Generation, but with about 1/18 times as low computation time and about 25 times as high energy efficiency, as this benchmark commercial processor.

Index Terms—Massive MIMO, OFDM, channel estimation, memristor crossbar, in-memory computing.

# I. INTRODUCTION

ASSIVE multi-input multi-output (MIMO) and orthogonal frequency division multiplexing (OFDM) are pivotal technologies enabling 5G systems. Their descendant, the extremely large-scale MIMO, is also expected to play an indispensable role in future 6G systems, as a benefit of its remarkable capability of improving spectral and energy efficiency [1]. However, the large number of antennas imposes great challenges to processors in massive MIMO-OFDM systems [2], [3], especially the algorithms involving matrix operation.

Channel estimation is a key component in MIMO-OFDM systems which acquires channel state information (CSI) for signal detection [4]. In the past decades, various

This paper was supported by Beijing Municipal Natural Science Foundation under Grant L242013. (Corresponding author: Shaoshi Yang).

Y.-H. Ren, S. Yang, Z.-H. Xiong, Y.-X. Zhang and J.-H. Bi are with the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, and the Key Laboratory of Universal Wireless Communications, Ministry of Education, Beijing 100876, China (e-mails: {renyihang, shaoshi.yang, zihao.xiong, yuxin.zhang, bijiahui}@bupt.edu.cn)).

S. Chen is with School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, UK (e-mail: sqc@ecs.soton.ac.uk).

MIMO/OFDM channel estimation algorithms were proposed to strike a balance among performance, pilot overhead [5], [6], and computational complexity in engineering applications. In particular, compressed sensing based methods are widely used to exploit sparsity in channel estimation [7], [8], deep learning based methods are employed to deal with complicated channel distortion and interference [9], and angle-domain channel parameters are estimated by using various array signal processing techniques [10]–[13]. Linear algorithm based channel estimators are commonly used because of low-complexity and they are especially crucial for massive number of antennas [14] and high mobility wideband scenarios [15]. The least squares (LS) is one of the fundamental linear channel estimation algorithms in MIMO-OFDM [4]. However, as the number of users simultaneously transmitting data increases in future 6G systems, the LS estimator, involving intensive matrix inversion and multiplication operations, still faces a huge burden of computation complexity and energy consumption in the time when Moore's law gradually approaches its limits.

Since Hewlett Packard Laboratories brought the memristor to life, it has become one of the most promising non-volatile memory devices due to its attractive characteristics whose resistance is determined by the number of charges flowing through it, and will remain unchanged unless the stimulation exceeds the threshold. Research interests in designing memristor crossbar based circuits or chips for accelerating neural network (NN) training are booming [16] due to the promising high power efficiency.

In NN, the conductance value of the memristor is continuously changed through multiple epochs of training, according to the fed back verification function. Hence, the impact of the written error (the difference between the target conductance value and the actual conductance value) of memristors is reduced. However, communication systems need to process information in real-time with high accuracy (milliseconds in 5G) but the bit-precision (the number of stable conductance states to represent a value) of memristors is limited.

In recent years, there have been a few studies answering how to design memristor crossbar based in-memory computing (IMC) circuit for communication systems. A memristor crossbar based circuit design for the optimal maximum likelihood MIMO detection with high computation parallelism was presented in [17]. Other works included a MIMO linear detector circuit with high robustness to conductance errors [18], [19], a MIMO zero forcing precoder circuit [20], a MIMO minimum mean squared error detector circuit [21], a ridge regression circuit applied to MIMO detector [22], a successive interference cancellation decoder circuit [23], and a circuit for box-constrained massive MIMO detection based on the alternating direction method of multipliers (ADMM) [24], all leveraging memristor crossbar based IMC technique.

However, no research has focused on channel estimation especially pilot based channel estimation in open literature. The main difference between the estimator and precoder/detector is the prior information of pilots. In this paper, we design a memristor crossbar based IMC circuit for massive MIMO-OFDM channel estimation. This constitutes the first-of-its-kind study in the research community, and our novel contributions are summarized as follows.

- Our IMC circuit makes full use of the prior information to facilitate the calculation of discrete Fourier transform (DFT) and the inverse DFT (IDFT), and estimate CSI directly in memory.
- 2) Simulation results show that the mean square error (MSE) performance of our proposed circuit is almost the same as that achieved by the double-precision floating point (FP64) digital computer when the memristor bit-precision is 7-bit.
- 3) The computation time and energy efficiency of our circuit design are compared with those of commercial graphics processing units (GPUs). The proposed circuit shows about 1/18 times as low computation delay and about 25 times as high energy efficiency, as the commercial GPU *NVIDIA RTX 6000 Ada Generation*, in MIMO-OFDM channel estimation tasks.

# II. SYSTEM MODEL

We consider the uplink of massive MIMO-OFDM systems comprising an  $N_r$ -antenna base station (BS),  $N_t$  single-antenna users, and K subcarriers for each user antenna. At time index n, given the transmitted OFDM symbol  $\mathbf{X}^t(n) \in \mathbb{C}^{K \times 1}$  and the unitary DFT matrix  $\mathcal{F} \in \mathbb{C}^{K \times K}$ , the transmit (TX) signal vector is given by  $\mathbf{x}^t(n) = \mathcal{F}^H \mathbf{X}^t(n)$ . Then, the c-length cyclic prefix (CP) is added before  $\mathbf{x}^t(n)$  to avoid inter-symbol interference, where  $c \geq L$  and L is the maximum length of channels. At the BS receiver, after removing the CP, the receive (RX) signal can be written as

$$\mathbf{y}^{r}(n) = \sum_{t=1}^{N_t} \mathbf{H}_{cir}^{r,t} \mathcal{F}^{H} \mathbf{X}^{t}(n) + \mathbf{z}^{r}(n), \tag{1}$$

where  $\mathbf{H}_{\mathrm{cir}}^{r,t} \in \mathbb{C}^{K \times K}$  denotes the circulant channel matrix with the first column as  $\left[ (\mathbf{h}^{r,t})^{\mathrm{T}}, \mathbf{0}_{1 \times (K-L)} \right]^{\mathrm{T}}$ , and  $\mathbf{h}^{r,t} \in \mathbb{C}^{1 \times L}$  denotes the channel impulse response (CIR) from the tth TX antenna to the rth RX antenna, and  $\mathbf{z}^{r}(n)$  represents

the additive white Gaussian noise vector on the rth antenna. The FFT of the RX signal  $\mathbf{y}^r(n)$  can be obtained by

$$\mathbf{Y}^{r}(n) = \sum_{t=1}^{N_{t}} \operatorname{diag} \left\{ \sqrt{K} \mathcal{F} \left[ (\mathbf{h}^{r,t})^{T}, \mathbf{0}_{1 \times (K-L)} \right]^{T} \right\} \mathbf{X}^{t}(n) + \mathbf{Z}^{r}(n)$$

$$= \sum_{t=1}^{N_{t}} \operatorname{diag} \left\{ \mathbf{X}^{t}(n) \right\} \mathbf{F} \mathbf{h}^{r,t} + \mathbf{Z}^{r}(n), \tag{2}$$

where  $\mathbf{Y}^r(n) = \mathcal{F}\mathbf{y}^r(n)$ , diag $\{\mathbf{X}\}$  is the diagonal matrix with  $\mathbf{X}$  as its diagonal elements,  $\mathbf{F} \in \mathbb{C}^{K \times L}$  denotes the first L columns of  $\mathcal{F}$  multiplying by  $\sqrt{K}$ , and  $\mathbf{Z}^r(n) = \mathcal{F}\mathbf{z}^r(n)$ .

Assume that the P pilot tones distributes over g consecutive OFDM symbols and the CSI is unchanged in f consecutive OFDM symbols, where g < f. Then (2) can be written as

$$\begin{bmatrix} \mathbf{Y}^{r}(0) \\ \vdots \\ \mathbf{Y}^{r}(g-1) \end{bmatrix} = \begin{bmatrix} \mathbf{X}_{\text{diag}}^{1}(0)\mathbf{F} & \cdots & \mathbf{X}_{\text{diag}}^{N_{t}}(0)\mathbf{F} \\ \vdots & \cdots & \vdots \\ \mathbf{X}_{\text{diag}}^{1}(g-1)\mathbf{F} & \cdots & \mathbf{X}_{\text{diag}}^{N_{t}}(g-1)\mathbf{F} \end{bmatrix} \times \begin{bmatrix} \mathbf{h}^{r,1} \\ \vdots \\ \mathbf{h}^{r,N_{t}} \end{bmatrix} + \begin{bmatrix} \mathbf{Z}^{r}(0) \\ \vdots \\ \mathbf{Z}^{r}(g-1) \end{bmatrix}, \quad (3)$$

where  $\mathbf{X}_{\text{diag}}^t(n) = \text{diag}\{\mathbf{X}^t(n)\}.$ 

According to [25], the pilot signals only consume P/g subcarriers of each OFDM symbol. Considering the pilot part of the signals, (3) can be simplified as

$$\widetilde{\mathbf{Y}}^r = \widetilde{\mathbf{A}}\mathbf{h}^r + \widetilde{\mathbf{Z}}^r,\tag{4}$$

where

$$\begin{split} \widetilde{\mathbf{Y}}^r &= \begin{bmatrix} \widetilde{\mathbf{Y}}^r(0) \\ \vdots \\ \widetilde{\mathbf{Y}}^r(g-1) \end{bmatrix}, \, \mathbf{h}^r = \begin{bmatrix} \mathbf{h}^{r,1} \\ \vdots \\ \mathbf{h}^{r,N_t} \end{bmatrix}, \\ \widetilde{\mathbf{A}} &= \begin{bmatrix} \widetilde{\mathbf{X}}^1_{\mathrm{diag}}(0)\widetilde{\mathbf{F}}(0) & \cdots & \widetilde{\mathbf{X}}^{N_t}_{\mathrm{diag}}(0)\widetilde{\mathbf{F}}(0) \\ \vdots & \cdots & \vdots \\ \widetilde{\mathbf{X}}^1_{\mathrm{diag}}(g-1)\widetilde{\mathbf{F}}(g-1) & \cdots & \widetilde{\mathbf{X}}^{N_t}_{\mathrm{diag}}(g-1)\widetilde{\mathbf{F}}(g-1) \end{bmatrix}, \\ \widetilde{\mathbf{Z}}^r &= \begin{bmatrix} \widetilde{\mathbf{Z}}^r(0) \\ \vdots \\ \widetilde{\mathbf{Z}}^r(g-1) \end{bmatrix}, \, \widetilde{\mathbf{X}}^t_{\mathrm{diag}}(n) \in \mathbb{C}^{P/g \times P/g}, \end{split}$$

while  $\widetilde{\mathbf{Y}}^r(n)$ ,  $\widetilde{\mathbf{Z}}^r(n)$  and  $\widetilde{\mathbf{F}}(n)$  correspond to the P/g rows of  $\mathbf{Y}^r(n)$ ,  $\mathbf{Z}^r(n)$ , and  $\mathbf{F}(n)^1$ , respectively.

Applying the LS algorithm on (4) to estimate the CIR, we obtain  $\hat{\mathbf{h}}^r = \widetilde{\mathbf{A}}^\dagger \widetilde{\mathbf{Y}}^r = \mathbf{h}^r + \widetilde{\mathbf{A}}^\dagger \widetilde{\mathbf{Z}}^r$ , where  $\widetilde{\mathbf{A}}^\dagger$  denotes the pseudo-inverse of  $\widetilde{\mathbf{A}}$ . To calculate  $\hat{\mathbf{h}}^r$ , the inverse operation with time complexity of  $O(PL^2N_t^2)$  is inevitable and the following matrix-vector multiplication shows  $O(PLN_t)$  time complexity. With the increasing demands on the performance of communication systems due to the number of transmit

 $<sup>^{1}</sup>$ The index n reflects that **F** varies with time depending on the position of the pilot tones in the different OFDM symbols



Fig. 1. Memristor crossbar based IMC circuit for MIMO-OFDM channel estimation and DFT/IDFT.

antennas, subcarriers, and receiver taps in MIMO-OFDM systems, the existing von Neumann based digital computational paradigms will face severe challenges.

### III. THE PROPOSED IMC BASED CHANNEL ESTIMATOR

In this section, we present an IMC circuit based on memristor crossbars for the channel estimation and the OFDM signal preprocessing in receivers. The proposed circuit can effectively break the memory wall and significantly accelerate the process of estimation and DFT/IDFT.

The proposed IMC circuit shown in Fig. 1 comprises two sets of memristor crossbars with one-transistor- one-memristor (1T1R) structure (cross-point arrays with a memristor device connected with a transistor at each intersection). Each memristor crossbars contains two sub-crossbars with the same size. In Fig. 1, the blue line, the yellow line, and the green line denotes the source line (SL), the bit line (BL), and the word line (WL), respectively, composing the structural foundation of memory. The peripheral circuit composed of operational amplifiers (OAs), inverters and transistors is connected by the black line.

We utilize the changeable conductance of memristors to store information. The conductance matrix  $\mathbf{G}^+$  and  $\mathbf{G}^-$  are stored in each sub-crossbar, and the same information is stored in the two sets of memristor crossbar.

First, we set  $V_{dd}$  to a low voltage level, meanwhile  $-V_{dd}$  at a high voltage level. A set of current  $\mathbf{i_{in}}$  is applied to the left array, and the output voltage in Fig. 1 is  $\mathbf{v_{out}}$ . According to Kirchhoff's law, the input current can be represented by

$$\mathbf{i_{OA}} = \mathbf{i_{in}} + (\mathbf{G}^+ - \mathbf{G}^-)\mathbf{v_{out}}$$
 (5)

at the inverting inputs of the upper set of operational amplifiers, and the outputs of the upper set of OAs are obtained by  $\mathbf{v_{OA}} = r\mathbf{i_{OA}}$ , where r denotes the feedback resistors of

OAs.  $v_{OA}$  serves as the input of the right set of memristor crossbars, and the output current is

$$(\mathbf{G}^{+T} - \mathbf{G}^{-T})\mathbf{v_{OA}} = \mathbf{0},\tag{6}$$

because of the "virtual ground" by the lower sets of OAs. Assuming  $\mathbf{G} = \mathbf{G}^+ - \mathbf{G}^-$ , the relationship between  $\mathbf{i_{in}}$  and  $\mathbf{v_{out}}$  can be obtained by

$$\mathbf{v_{out}} = -(\mathbf{G}^T \mathbf{G})^{-1} \mathbf{G}^T \mathbf{i_{in}}.$$
 (7)

The proposed IMC circuit processes the LS algorithm for MIMO-OFDM channel estimation by (7) in one step while mapping  $\widetilde{\mathbf{A}}$ ,  $\widetilde{\mathbf{Y}}^r$  onto  $\mathbf{G}$  and  $\mathbf{i_{in}}$ .

Limited by the conductance range of memristors (the range of conductance value that memristors can be written to be), the elements in matrix  $\widetilde{\mathbf{A}}$  cannot be written into the crossbar directly. Assume that the conductance range of memristors is  $(\mathbf{G_{\min}}, \mathbf{G_{\max}})$  with  $\mathbf{G_{\min}} > \mathbf{0}$ , the real/image part value of pilot signal  $\widetilde{\mathbf{X}}_{\mathrm{diag}}^t(n)$  is in the range  $(\mathbf{x_{\min}}, \mathbf{x_{\max}})$  with  $\mathbf{x_{\min}} < \mathbf{0} < -\mathbf{x_{\min}} < \mathbf{x_{\max}}$ , and the real/image part value of  $\widetilde{\mathbf{F}}(n)$  is in the range of [-1,1]. Then, the real/image part value of  $\widetilde{\mathbf{A}}$  is in the range of  $(-\mathbf{x_{\max}}, \mathbf{x_{\max}})$ . Using a pair of memristors to represent the real/image part of  $\widetilde{\mathbf{A}}$ , the conductance range is  $(\mathbf{G_{\min}} - \mathbf{G_{\max}}, \mathbf{G_{\max}} - \mathbf{G_{\min}})$ .  $\widetilde{\mathbf{X}}_{\mathrm{diag}}^t(n)$  is amplified by a factor of a to fully utilize the conductance range, where  $a = (\mathbf{G_{\max}} - \mathbf{G_{\min}})/\mathbf{x_{\max}}$ .

conductance range, where  $a = (\mathbf{G_{max}} - \mathbf{G_{min}})/\mathbf{x_{max}}$ . Under the system model,  $\widetilde{\mathbf{X}}^t_{\mathrm{diag}}(n)$ ,  $\widetilde{\mathbf{F}}(n)$ , and the size of DFT are known information.  $\widetilde{\mathbf{X}}^t_{\mathrm{diag}}(n)\widetilde{\mathbf{F}}(n) = \widetilde{\mathbf{A}}$  is stored in the memristor crossbar. Since  $\widetilde{\mathbf{A}}$  is a complex matrix, it cannot be directly implemented in memristor crossbar, which only supports non-negative real conductance values. The equivalent real-valued model is

$$\overline{\widetilde{\mathbf{A}}} = \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{A}}) & -\mathfrak{I}(\widetilde{\mathbf{A}}) \\ \mathfrak{I}(\widetilde{\mathbf{A}}) & \mathfrak{R}(\widetilde{\mathbf{A}}) \end{bmatrix},$$

where  $\underline{\mathfrak{R}}(\cdot)$  and  $\mathfrak{I}(\cdot)$  denote the real and image parts, respectively.  $\overline{\mathbf{A}}$  can be represented by the difference of two positive real matrices as follows:

$$\widetilde{\widetilde{\mathbf{A}}} = \overline{\widetilde{\mathbf{A}}}^{+} - \overline{\widetilde{\mathbf{A}}}^{-}$$

$$= \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{A}}^{+}) & \mathfrak{I}(\widetilde{\mathbf{A}}^{-}) \\ \mathfrak{I}(\widetilde{\mathbf{A}}^{+}) & \mathfrak{R}(\widetilde{\mathbf{A}}^{+}) \end{bmatrix} - \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{A}}^{-}) & \mathfrak{I}(\widetilde{\mathbf{A}}^{+}) \\ \mathfrak{I}(\widetilde{\mathbf{A}}^{-}) & \mathfrak{R}(\widetilde{\mathbf{A}}^{-}) \end{bmatrix}. \quad (8)$$

Map  $\overline{\widetilde{\mathbf{A}}}^+$  and  $\overline{\widetilde{\mathbf{A}}}^-$  onto the memristor crossbars, which satisfy:

$$\mathbf{G}^{+} = a \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{A}}^{+}) & \mathfrak{I}(\widetilde{\mathbf{A}}^{-}) \\ \mathfrak{I}(\widetilde{\mathbf{A}}^{+}) & \mathfrak{R}(\widetilde{\mathbf{A}}^{+}) \end{bmatrix}, \ \mathbf{G}^{-} = a \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{A}}^{-}) & \mathfrak{I}(\widetilde{\mathbf{A}}^{+}) \\ \mathfrak{I}(\widetilde{\mathbf{A}}^{-}) & \mathfrak{R}(\widetilde{\mathbf{A}}^{-}) \end{bmatrix}.$$

When the BS receives signals, the real and imaginary parts of the pilot signal  $\tilde{\mathbf{Y}}^r$  are combined and scaled to form the input current vector

$$\mathbf{i_{in}} = \overline{\widetilde{\mathbf{Y}}}^r = b \begin{bmatrix} \mathfrak{R}(\widetilde{\mathbf{Y}}^r) \\ \mathfrak{I}(\widetilde{\mathbf{Y}}^r) \end{bmatrix}, \tag{9}$$

where b is a scaled factor. Based on (7), the output signal is obtained as

$$\mathbf{v_{out}} = -(\mathbf{G}^T \mathbf{G})^{-1} \mathbf{G}^T \mathbf{i_{in}} = -b/a (\overline{\widetilde{\mathbf{A}}}^T \overline{\widetilde{\mathbf{A}}})^{-1} \overline{\widetilde{\mathbf{A}}}^T \overline{\widetilde{\mathbf{Y}}}^r$$
$$= -b/a [\Re(\hat{\mathbf{h}}^r) \Im(\hat{\mathbf{h}}^r)]^T, \tag{10}$$

which is the CIR received on the rth antenna.

Furthermore, the DFT and IDFT are basic operations in OFDM systems. By setting  $V_{dd}$  to a high voltage level, the OAs between two sets of memristor crossbars are disconnected and become independent. Memristor crossbar stores the linear scaled  $\mathcal{F}$ , namely,

$$\mathbf{G}^{+} = c \begin{bmatrix} \mathfrak{R}(\mathcal{F}^{+}) & \mathfrak{I}(\mathcal{F}^{-}) \\ \mathfrak{I}(\mathcal{F}^{+}) & \mathfrak{R}(\mathcal{F}^{+}) \end{bmatrix}, \ \mathbf{G}^{-} = c \begin{bmatrix} \mathfrak{R}(\mathcal{F}^{-}) & \mathfrak{I}(\mathcal{F}^{+}) \\ \mathfrak{I}(\mathcal{F}^{-}) & \mathfrak{R}(\mathcal{F}^{-}) \end{bmatrix}$$

where c is an appropriate scaled factor. As shown in Fig. 1, when the input is the time-domain signal  $[\Re(\mathbf{x}) \Im(\mathbf{x})]^T$ , the output current is the frequency-domain signal as  $\mathbf{i}_{OA} = c[\Re(\mathbf{X}) \Im(\mathbf{X})]^T$ .

The proposed memristor crossbar based IMC circuit makes full use of the storage information in the non-volatile memristor crossbar and can switch state according to the storage information, solving channel estimation or DFT/IDFT in one step.

## IV. SIMULATION RESULTS

# A. Estimation Performance of the Proposed Circuit

To employ memristors as memory units in communication systems, each memristor need to be written to an appropriate conductance value. The accuracy of memristor crossbar based IMC and bit density increase with the improvement of bit-precision. However, the difference between the target conductance value and the written conductance value impacts the circuit accuracy.

One of the common methods to reduce written error and increase bit-precision is verification after writing. If the



Fig. 2. MSE comparison of the proposed circuit (with bit-precision of 3-bit, 5-bit and 7-bit) against digital computer (FP64 bit-precision) and 7-bit precision DFT in OFDM with digital channel estimation against 5-bit precision estimation.

current conductance does not satisfy the target conductance, a subsequent writing pulse is applied to adjust the conductance. Writing and verification increase the writing time of memristor crossbars, which becomes more problematic in time-sensitive scenarios such as precoding and detection in communication systems. In these scenarios, it is essential to strike a balance between writing time and bit-precision, when pilot-based channel estimation is not "time-sensitive". This necessity arises because the pilots, which are the agreedupon information between the transmitter and the receiver, have been stored in memory. It is worth mentioning that the bit-precision has an upper limit, e.g, it can reach 11-bit [26], because of the characteristics of the memristor. Consequently, the writing operation finishes before receiving signals for channel estimation and writing time is almost not a limiting factor for bit-precision.

We investigate the influence of bit-precision on memristor crossbar by the mean square error (MSE) of channel estimation. Furthermore, we consider calculating the DFT in the memristor crossbar first and then estimating the CIR. In addition, the conductance value shows slight fluctuations over time and reading operation has little impact on conductance. These characteristics are all considered in the simulation. In the simulation, the BS employs  $N_r=32$  antennas, the number of users is  $N_t=32$ , the number of subcarriers is K=256, the number of pilot tones is P=64, and the modulation scheme is quadrature phase-shift keying (QPSK) for all the pilot symbols and general data. We assume that the system estimates the channel once in the f consecutive OFDM symbols with all the pilot tones on the first OFDM symbol.

The simulation results in Fig. 2 show the MSEs versus the signal-to-noise ratio (SNR) of the LS channel estimates using digital computer (FP64) as well as 3-bit, 5-bit and

 $\label{table I} TABLE\ I$  Comparison between the proposed circuit and commercial GPU.

| Performance metric | Proposed circuit  | NVIDIA RTX 6000 Ada Generation | NVIDIA RTX 4000 Ada Generation |
|--------------------|-------------------|--------------------------------|--------------------------------|
| Computation time   | 100 ns            | 1.85 μs                        | 6.30 μs                        |
| Energy consumption | $21.76 \ \mu J$   | $554.21~\mu\mathrm{J}$         | $819.42 \mu 	ext{J}$           |
| Energy efficiency  | 1.934 teraFLOPs/J | 0.075 teraFLOPs/J              | 0.051 teraFLOPs/J              |



Fig. 3. Computation time for achieving convergence with different samples.

7-bit memristor crossbars. Additionally, we evaluate two hybrid approaches: 1) performing the DFT in 7-bit memristor crossbars followed by LS estimation using digital processor, and 2) executing DFT within 7-bit memristor crossbars and LS estimation within 5-bit memristor crossbars.

It can be seen from Fig. 2 that the MSE curves achieved by the proposed IMC circuit with 7-bit precision and the digital computer are almost overlapping. The simulation results show that the SNR experiences a reduction of 2.5 dB when the 7-bit precision is reduced to the 5-bit precision. By comparing the results of digital computer to 7-bit DFT before digital computer estimation and 5-bit estimation to 7-bit DFT before 5-bit estimation, it can be concluded that 7-bit memristor crossbar can well meet the accuracy of DFT, and LS is more susceptible to the impact of the bit-precision of memristor crossbar than on DFT.

## B. Comparison with Digital Processors

The results of Fig. 2 suggest that the memristor crossbar with 7-bit precision is capable of achieving the similar channel estimation accuracy as the digital processor. Therefore, we use the memristor crossbar of 7-bit precision in this set of simulations. Note that the writing time of different bit-precision memristor crossbars is also different. The size of the memristor crossbar is  $(128 \times 256) \times 2$  according to the simulation parameters of P=64 and  $LN_t=64$ .

We first evaluate the convergence time of the proposed circuit with OAs having gain-bandwidth product of 500 MHz



Fig. 4. Throughput and energy efficiency of the proposed circuit with different scales in comparison to those of NVIDIA RTX 6000 Ada Generation.

and amplifier gain of  $80\,\mathrm{dB}$ , i.e., the time required by the circuit to reach a steady state for processing h on a single antenna, with the aid of LTspice<sup>®</sup>. In Fig. 3, different curves correspond to different samples of h. It can be seen that the convergence time is typically less than  $80\,\mathrm{ns}$ . Thus the circuit needs  $N_r \times 80\,\mathrm{ns}$  to obtain all the CIRs in one estimation. However, the calculation of CIR for each antenna is independent of each other, and all the  $N_r$  CIRs can be computed in parallel, resulting in single CIR convergence time.

The number of floating point operations (FLOPs) required by the LS estimator is  $N_r((LN_t)^3+4(LN_t)^2P+PLN_t)$  which becomes 42074112 under the simulation parameters. The proposed circuit, operating serially, can complete channel estimation within  $3.2\mu s$ , and within  $0.1\mu s$  in a parallel manner, which estimates multiple sub-channels simultaneously). The proposed circuit shows 1/18 times as low computation delay as the commercial GPU NVIDIA RTX 6000 Ada Generation [27], which requires  $1.85\,\mu s$  (considering the communication latency between processor and memory).

The power consumption of a single memristor is estimated to be  $0.4\,\mu\mathrm{W}$  according to the programming rule. A single OA designed in [20] consumes about  $0.516\,\mathrm{mW}$  of power, and the power consumption of a single ADC is about  $52.224\,\mathrm{mW}$  [28]. By adding up the power consumption of all the essential components of the circuit, the power consumption of the proposed circuit is estimated to be  $6.84\,\mathrm{W}$  without considering low-power design, i.e., without utilizing advanced ADC or

optimizing the workflow to reduce the working time. Defining

Energy efficiency = FLOPs/Energy consumption,

we also infer that the energy efficiency of the proposed circuit is about 25 times as high as that of the commercial GPU *NVIDIA RTX 6000 Ada Generation* [27] and 38 times as high as *NVIDIA RTX 4000 Ada Generation* [29] (see Table I).

Finally, the throughput of the proposed IMC circuit increases with the scale of memristor crossbar M as shown in Fig. 4, and it outperforms the commercial GPU when about M>360. Note that the x-axis is on a logarithmic scale. The energy efficiency of the proposed circuit increases with the crossbar scale and it always outperforms the commercial GPU.

### V. CONCLUSION

We have designed a memristor crossbar based IMC circuit for the LS channel estimation in massive MIMO-OFDM systems. Our scheme breaks the memory wall in communication systems, and reduces the latency between processors and memory. The simulation results have shown that the proposed circuit with 7-bit memristors achieves comparable accuracy of the digital computer. Our simulations have also verified that the proposed circuit is approximately 18 times as fast and 25 times as energy-efficient, as the commercial GPU NVIDIA RTX 6000 Ada Generation. Therefore, the proposed circuit has shown great potential (orders of magnitude) in channel estimation tasks for extremely large-scale MIMO-OFDM systems.

# REFERENCES

- [1] C.-X. Wang, X. You, X. Gao, X. Zhu *et al.*, "On the road to 6G: Visions, requirements, key technologies, and testbeds," *IEEE Commun. Surveys Tuts.*, vol. 25, no. 2, pp. 905–974, 2nd Quarter 2023.
- [2] L. Lu, G. Y. Li, A. L. Swindlehurst, A. Ashikhmin, and R. Zhang, "An overview of massive MIMO: Benefits and challenges," *IEEE J. Sel. Topics Signal Process.*, vol. 8, no. 5, pp. 742–758, Oct. 2014.
- [3] S. Yang and L. Hanzo, "Fifty years of MIMO detection: The road to large-scale MIMOs," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 1941–1988, 4th Quarter 2015.
- [4] Y. Liu, Z. Tan, H. Hu, L. J. Cimini, and G. Y. Li, "Channel estimation for OFDM," *IEEE Commun. Surveys Tuts.*, vol. 16, no. 4, pp. 1891– 1908, 4th Quarter 2014.
- [5] T. Lv, S. Yang, and H. Gao, "Semi-blind channel estimation relying on optimum pilots designed for multi-cell large-scale MIMO systems," *IEEE Access*, vol. 4, pp. 1190–1204, Apr. 2016.
- [6] H. Wang, G. Li, S. Zheng, S. Yang, and P. Pan, "An approach to reduce the overhead of training sequences in FDD massive MIMO downlink systems," *IEEE Wireless Commun. Lett.*, vol. 8, no. 4, pp. 1301–1305, Aug. 2019.
- [7] J. W. Choi, B. Shim, Y. Ding, B. Rao et al., "Compressed sensing for wireless communications: Useful tips and tricks," *IEEE Commun. Surveys Tuts.*, vol. 19, no. 3, pp. 1527–1550, 3rd Quarter 2017.
- [8] K. Meng, S. Yang, X.-Y. Wang, Y. Bu et al., "Joint sparsity pattern learning based channel estimation for massive MIMO-OTFS systems," *IEEE Trans. Veh. Technol.*, vol. 73, no. 8, pp. 12189–12194, Aug, 2024.
- [9] H. Ye, G. Y. Li, and B.-H. Juang, "Power of deep learning for channel estimation and signal detection in OFDM systems," *IEEE Wireless Commun. Lett.*, vol. 7, no. 1, pp. 114–117, Feb. 2018.

- [10] A. Hu, T. Lv, H. Gao, Z. Zhang, and S. Yang, "An ESPRIT-based approach for 2-D localization of incoherently distributed sources in massive MIMO systems," *IEEE J. Sel. Areas Commun.*, vol. 8, no. 5, pp. 996–1011, 2014.
- [11] T. Lv, F. Tan, H. Gao, and S. Yang, "A beamspace approach for 2-D localization of incoherently distributed sources in massive MIMO systems," *Signal Process.*, vol. 121, pp. 30–45, Apr. 2016.
- [12] Y. Zhou, Z. Fei, S. Yang, J. Kuang, S. Chen, and L. Hanzo, "Joint angle estimation and signal reconstruction for coherently distributed sources in massive MIMO systems based on 2-D unitary ESPRIT," *IEEE Access*, vol. 5, pp. 9632–9646, Jun. 2017.
- [13] X.-Y. Wang, S. Yang, J. Zhang, C. Masouros, and P. Zhang, "Clutter suppression, time-frequency synchronization, and sensing parameter association in asynchronous perceptive vehicular networks," *IEEE J. Sel. Areas Commun.*, vol. 42, no. 10, pp. 2719–2736, Oct. 2024.
- [14] B. Li, Z. Wei, S. Yang, Y. Zhang et al., "Beyond MMSE: Rank-1 subspace channel estimator for massive MIMO systems," *IEEE Trans. Commun.*, vol. 73, no. 8, pp. 12189–12194, Aug, 2024.
- [15] H. Sarieddeen, M.-S. Alouini, and T. Y. Al-Naffouri, "An overview of signal processing techniques for terahertz communications," *Proc. IEEE*, vol. 109, no. 10, pp. 1628–1665, Oct. 2021.
- [16] W. Zhang, P. Yao, B. Gao, Q. Liu et al., "Edge learning using a fully integrated neuro-inspired memristor chip," Science, vol. 381, no. 6663, pp. 1205–1211, Sep. 2023.
- [17] Y.-H. Ren, S. Yang, J.-H. Bi, and Y.-X. Zhang, "Accelerating maximum-likelihood detection in massive MIMO: A new paradigm with memristor crossbar based in-memory computing circuit," *IEEE Trans. Veh. Technol.*, vol. 73, no. 12, pp. 19745–19750, Dec. 2024.
- [18] J.-H. Bi, S. Yang, P. Zhang, and S. Chen, "Amplifier-enhanced memristive massive MIMO linear detector circuit: An ultra-energy-efficient and robust-to-conductance-error design," in *Proc. IEEE Global Communications Conference (GLOBECOM 2024)*, Cape Town, South Africa, Dec. 8-12, 2024, pp. 3968–3973.
- [19] —, "In-memory massive MIMO linear detector circuit with extremely high energy efficiency and strong memristive conductance deviation robustness," in *Proc. IEEE Global Communications Conference (GLOBECOM 2024)*, Cape Town, South Africa, Dec. 8-12, 2024, pp. 728–733
- [20] P. Zuo, Z. Sun, and R. Huang, "Extremely-fast, energy-efficient massive MIMO precoding with analog RRAM matrix computing," *IEEE Trans. Circuits Syst. II*, vol. 70, no. 7, pp. 2335–2339, Jul. 2023.
- [21] Y. Fang, L. Chen, C. You, and H. Yin, "Rethinking massive MIMO detection: A memristor approach," *IEEE Commun. Lett.*, vol. 27, no. 12, pp. 3350–3354, Dec. 2023.
- [22] P. Mannocci, E. Melacarne, and D. Ielmini, "An analogue in-memory ridge regression circuit with application to massive MIMO acceleration," *IEEE Trans. Emerg. Sel. Topics Circuits Syst.*, vol. 12, no. 4, pp. 952–962, Nov. 2022.
- [23] J.-H. Bi, S. Yang, S. Chen, and P. Zhang, "High-speed ultra-energy-efficient memristor-based massive MIMO SIC detector circuit with hybrid analog-digital computing architecture," *IEEE Trans. Veh. Technol.*, vol. 74, no. 7, pp. 11495 11500, Jul. 2025.
- [24] J.-H. Bi, S. Yang, and P. Zhang, "High-speed and ultra-energy-efficient in-memory computing circuit for ADMM-based box-constrained massive MIMO signal detection," *IEEE Wireless Communications Letters*, pp. 1–6, Aug. 2025, early access, DOI: 10.1109/LWC.2025.3597308.
- [25] I. Barhumi, G. Leus, and M. Moonen, "Optimal training design for MIMO OFDM systems in mobile wireless channels," *IEEE Trans.* Signal Process., vol. 51, no. 6, pp. 1615–1624, Jun. 2003.
- [26] M. Rao, H. Tang, J. Wu, W. Song et al., "Thousands of conductance levels in memristors integrated on CMOS," *Nature*, vol. 615, pp. 823– 829. Mar. 2023.
- [27] "Data sheet: RTX 6000 Ada Generation," NVIDIA, 2023. [Online]. Available: https://resources.nvidia.com/en-us-briefcase-for-datasheets/ proviz-print-rtx6000-1
- [28] L. Kull, T. Toifl, M. Schmatz, P. A. Francese et al., "A 3.1 mW 8b 1.2 GS/s single-channel asynchronous SAR ADC with alternate comparators for enhanced speed in 32 nm digital SOI CMOS," *IEEE J. Solid-State Circuits*, vol. 48, no. 12, pp. 3049–3058, Sep. 2013.
- [29] "Data sheet: RTX 4000 Ada Generation," NVIDIA, 2023. [Online]. Available: https://resources.nvidia.com/en-us-briefcase-for-datasheets/ rtx-4000-ada-datashe-1