# In-Memory Massive MIMO Linear Detector Circuit with Extremely High Energy Efficiency and Strong Memristive Conductance Deviation Robustness

Jia-Hui Bi, Shaoshi Yang, Senior Member, IEEE, Ping Zhang, Fellow, IEEE, Sheng Chen, Life Fellow, IEEE

Abstract—The memristive crossbar array (MCA) has been successfully applied to accelerate matrix computations of signal detection in massive multiple-input multiple-output (MIMO) systems. However, the unique property of massive MIMO channel matrix makes the detection performance of existing MCA-based detectors sensitive to conductance deviations of memristive devices, and the conductance deviations are difficult to be avoided. In this paper, we propose an MCA-based detector circuit, which is robust to conductance deviations, to compute massive MIMO zero forcing and minimum mean-square error algorithms. The proposed detector circuit comprises an MCA-based matrix computing module, utilized for processing the small-scale fading coefficient matrix, and amplifier circuits based on operational amplifiers (OAs), utilized for processing the large-scale fading coefficient matrix. We investigate the impacts of the open-loop gain of OAs, conductance mapping scheme, and conductance deviation level on detection performance and demonstrate the performance superiority of the proposed detector circuit over the conventional MCA-based detector circuit. The energy efficiency of the proposed detector circuit surpasses that of a traditional digital processor by several tens to several hundreds of times.

*Index Terms*—Massive MIMO, signal detection, linear detector, analog matrix computing, in-memory computing, memristive crossbar array.

## I. Introduction

Modern and next-generation wireless communication systems employ massive multiple-input multiple-output (MIMO) technology to increase transmission speed and improve user experience [1]. However, the extremely large number of antennas, while beneficial, also leads to extremely-high complexity of signal detection algorithms. With the goal of reducing detection latency in massive MIMO systems, a variety of lowcomplexity detection algorithms have been proposed in the past decades [2]. However, algorithms with low complexity usually suffer from considerable performance loss, making it difficult to trade off between high performance and low latency. Another popular and effective approach is accelerating MIMO detection by hardware innovations. However, traditional processors based on the von Neumann architecture struggle significantly to perform large-dimensional matrix operations. As the number of users simultaneously transmitting

Corresponding author: S. Yang.

data increases in next-generation wireless communication systems, the computational complexity of detection algorithms is bound to increase, and traditional von Neumann architecture-based processors are difficult to meet the requirements of receiver for processing speed and energy efficiency.

As a form of in-memory computing, the analog matrix computing technology based on memristive crossbar array (MCA) constitutes a revolutionary new matrix computational paradigm. The MCA can rapidly perform not only matrixvector multiplication (MVM) [3] through analog computing approach but also other matrix operations, such as the computation of inverse matrix [4] and pseudoinverse matrix [5], with the assistance of operational amplifiers (OAs). The MCAbased matrix computing circuit is not constrained by the socalled von Neumann bottleneck, and thereby presents notable benefits in computational speed and power consumption in contrast to traditional processors based on the von Neumann architecture. The MCA makes it possible to realize massive MIMO detectors with superior detection performance, speed, and energy efficiency, since that matrix operations constitute the core task of massive MIMO detection.

While MCA has been successfully applied to accelerate deep neural network training in artificial intelligence, research on the application of it in massive MIMO detection is still in its infancy. In the work [6], MCA was applied to baseband processors to accelerate MIMO detection. But this work only employed MCA to accelerate the MVM operations, relying on another processor to perform inverse matrix computations. An MCA-based zero forcing (ZF) precoder was proposed in [7]. This idea can be extended to the ZF or minimum mean-square error (MMSE) detector. In the study [8], MCAbased ZF detector and regularized ZF detector were proposed. In the study [9], an MCA-based detector circuit for ZF and MMSE detection was proposed, which has a structure similar to that of the circuit presented in [8]. However, in practical scenarios, the large-scale fading coefficients (LSFCs) of the user terminals (UTs) in a massive MIMO cell usually vary from each other, and thus the elements of the matrices computed in MCA-based circuits presented in [7]-[9] often obey probability distributions with different variances, which makes the detection performance of the existing MCA-based detectors sensitive to conductance deviations.

To solve this problem, we propose an MCA-based detector circuit in this paper, which can be employed to compute massive MIMO ZF and MMSE algorithms and is robust to

J.-H. Bi, S. Yang and P. Zhang are with the School of Information and Communication Engineering, Beijing University of Posts and Telecommunications, Beijing 100876, China (e-mails: bijiahui@bupt.edu.cn, shaoshi.yang@bupt.edu.cn, pzhang@bupt.edu.cn).

S. Chen is with the School of Electronics and Computer Science, University of Southampton, Southampton SO17 1BJ, U.K. (e-mail: sqc@ecs.soton.ac.uk).

conductance deviations. The proposed detector circuit comprises an MCA-based matrix computing module, utilized for processing the small-scale fading coefficient (SSFC) matrix, and OA-based amplifier circuits, utilized for processing the LSFC matrix. We investigate the impacts of the open-loop gain (OLG) of OAs, conductance mapping scheme, and conductance deviation level on detection performance. We demonstrate the performance superiority of the proposed detector circuit over the conventional MCA-based detector circuit and the energy efficiency superiority of the proposed circuit over the traditional digital processor.

The rest of this paper is organized as follows. Section II introduces the system model and basic algorithms considered in this paper. Section III presents the proposed MCA-based detector circuit. Section IV provides the conductance mapping schemes for the MCA-based detectors. Section V provides extensive simulation results. Section VI concludes this paper.

## II. SYSTEM MODEL AND BASIC ALGORITHMS

## A. System Model

We consider a massive MIMO cell, where a base station (BS) with R antennas serves K UTs, each equipped with a single antenna. The uplink signals can be described by:

$$\tilde{\mathbf{y}} = \tilde{\mathbf{H}}\tilde{\mathbf{s}} + \tilde{\mathbf{n}},\tag{1}$$

where  $\tilde{\mathbf{y}} \in \mathbb{C}^{R \times 1}$  denotes the received signals,  $\tilde{\mathbf{s}} \in \mathbb{C}^{K \times 1}$  denotes the transmitted signals,  $\tilde{\mathbf{H}} \in \mathbb{C}^{R \times K}$  denotes the channel matrix, and  $\tilde{\mathbf{n}} \in \mathbb{C}^{R \times 1}$  is a complex additive white Gaussian noise (AWGN) vector with variance  $\sigma_n^2$  per element.

Let  $\lambda_1, \dots, \lambda_K$  be the LSFCs between the K UTs and the BS, then  $\tilde{\mathbf{H}}$  can be described by:

$$\mathbf{H} = \mathbf{G}\boldsymbol{\Lambda},\tag{2}$$

where  $\tilde{\mathbf{\Lambda}} = \operatorname{diag} \left( \sqrt{\lambda 1}, \cdots, \sqrt{\lambda_K} \right)$  and  $\tilde{\mathbf{G}} \in \mathbb{C}^{R \times K}$  are the LSFC matrix and the SSFC matrix, respectively. We consider the Rayleigh fading channel model in this paper, which means that the elements of  $\tilde{\mathbf{G}}$  are zero-mean complex Gaussian random variables with variance  $\sigma_q^2$  per dimension, i.e.,

$$\tilde{g}_{i,j} \sim \mathcal{CN} \left( 0, 2\sigma_g^2 \right), \ 1 \leq i \leq R, \ 1 \leq j \leq K. \tag{3} \label{eq:3}$$

The complex-valued system model of (1) can be alternatively described as an equivalent real-valued expression of

$$y = Hs + n, (4)$$

where

$$\begin{split} \mathbf{y} &= \left[ \begin{array}{c} \Re(\tilde{\mathbf{y}}) \\ \Im(\tilde{\mathbf{y}}) \end{array} \right], \ \mathbf{s} = \left[ \begin{array}{c} \Re(\tilde{\mathbf{s}}) \\ \Im(\tilde{\mathbf{s}}) \end{array} \right], \ \mathbf{n} = \left[ \begin{array}{c} \Re(\tilde{\mathbf{n}}) \\ \Im(\tilde{\mathbf{n}}) \end{array} \right], \\ \mathbf{H} &= \left[ \begin{array}{c} \Re(\tilde{\mathbf{H}}) & -\Im(\tilde{\mathbf{H}}) \\ \Im(\tilde{\mathbf{H}}) & \Re(\tilde{\mathbf{H}}) \end{array} \right], \end{split}$$

in which  $\Re(\cdot)$  and  $\Im(\cdot)$  respectively denote the real and imaginary parts of the corresponding vector or matrix. Obviously,  $\mathbf{H} \in \mathbb{R}^{2R \times 2K}$  can be alternatively described by:

$$\mathbf{H} = \mathbf{G}\boldsymbol{\Lambda},\tag{5}$$

where  $\Lambda = \operatorname{diag}(\sqrt{\lambda_1}, \cdots, \sqrt{\lambda_K}, \sqrt{\lambda_1}, \cdots, \sqrt{\lambda_K})$  and

$$\mathbf{G} = \left[ \begin{array}{cc} \Re(\tilde{\mathbf{G}}) & -\Im(\tilde{\mathbf{G}}) \\ \Im(\tilde{\mathbf{G}}) & \Re(\tilde{\mathbf{G}}) \end{array} \right].$$

A massive MIMO detector needs to estimate s from y given H.

B. Basic Detection Algorithms

1) ZF Algorithm: The ZF algorithm can be expressed as:

$$\hat{\mathbf{s}}_{\mathrm{ZF}} = \left(\mathbf{H}^{\mathrm{T}}\mathbf{H}\right)^{-1}\mathbf{H}^{\mathrm{T}}\mathbf{y},\tag{6}$$

where  $(\cdot)^T$  and  $(\cdot)^{-1}$  denote the transpose matrix and inverse matrix, respectively.

2) MMSE Algorithm: The MMSE algorithm can be expressed as:

$$\hat{\mathbf{s}}_{\text{MMSE}} = \left(\mathbf{H}^{\text{T}}\mathbf{H} + \rho \mathbf{I}\right)^{-1} \mathbf{H}^{\text{T}}\mathbf{y},\tag{7}$$

where the parameter  $\rho = \frac{\sigma_n^2}{p_s}$ ,  $p_s$  is the average symbol energy of s, and I donotes the identity matrix of appropriate dimension.

# III. PROPOSED MCA-BASED CIRCUIT DESIGN

A. Transformations of Computational Expressions

Upon substituting (5) into (6) and (7) we obtain:

$$\hat{\mathbf{s}}_{\mathrm{ZF}} = \mathbf{\Lambda}^{-1} (\mathbf{G}^{\mathrm{T}} \mathbf{G})^{-1} \mathbf{G}^{\mathrm{T}} \mathbf{y}, \tag{8}$$

and

$$\hat{\mathbf{s}}_{\text{MMSE}} = \mathbf{\Lambda}^{-1} (\mathbf{G}^{T} \mathbf{G} + \mathbf{P})^{-1} \mathbf{G}^{T} \mathbf{y}, \tag{9}$$

where  $\mathbf{P} = \operatorname{diag}\left(\frac{\rho}{\lambda_1}, \frac{\rho}{\lambda_2}, \cdots, \frac{\rho}{\lambda_K}, \frac{\rho}{\lambda_1}, \frac{\rho}{\lambda_2}, \cdots, \frac{\rho}{\lambda_K}\right)$ . For expression convenience, we define  $\mathbf{W} = \mathbf{G}^{\mathrm{T}}\mathbf{G}$ , and

For expression convenience, we define  $\mathbf{W} = \mathbf{G}^{\mathrm{H}}\mathbf{G}$ , and  $\tilde{\mathbf{W}} = \tilde{\mathbf{G}}^{\mathrm{H}}\tilde{\mathbf{G}}$ , where  $(\cdot)^{\mathrm{H}}$  denotes the Hermitian transpose. Thus  $\Re(\tilde{\mathbf{W}})$  and  $\Im(\tilde{\mathbf{W}})$  are two real symmetric matrices and  $\mathbf{W}$  can be expressed as:

$$\mathbf{W} = \begin{bmatrix} \Re(\tilde{\mathbf{W}}) & -\Im(\tilde{\mathbf{W}}) \\ \Im(\tilde{\mathbf{W}}) & \Re(\tilde{\mathbf{W}}) \end{bmatrix}. \tag{10}$$

The real and imaginary parts of the elements  $\tilde{g}_{i,j}$  of  $\tilde{\mathbf{G}}$  are independent identically distributed Gaussian random variables, namely,  $\Re(\tilde{g}_{i,j}) \sim \mathcal{N}(0,\sigma_g^2)$  and  $\Im(\tilde{g}_{i,j}) \sim \mathcal{N}(0,\sigma_g^2)$ . Thus the mean values of the nondiagonal elements of both  $\Re(\tilde{\mathbf{W}})$  and  $\Im(\tilde{\mathbf{W}})$  are zeros, and the diagonal elements of  $\Im(\tilde{\mathbf{W}})$  are zeros, while the diagonal elements of  $\Re(\tilde{\mathbf{W}})$  obey a chi-square distribution:

$$\frac{\Re(\tilde{w}_{i,i})}{\sigma_a^2} \sim \chi^2(2R),\tag{11}$$

which means that the mean value of the diagonal elements of  $\Re(\tilde{\mathbf{W}})$  is  $2R\sigma_a^2$ .

By defining  $\mathbf{Q}_{\mathrm{ZF}} = 2R\sigma_g^2\mathbf{I}$ ,  $\mathbf{Q}_{\mathrm{MMSE}} = 2R\sigma_g^2\mathbf{I} + \mathbf{P}$  and  $\mathbf{X} = \mathbf{W} - 2R\sigma_g^2\mathbf{I}$ , we obtain:

$$\hat{\mathbf{s}}_{\mathrm{ZF}} = \mathbf{\Lambda}^{-1} (\mathbf{X} + \mathbf{Q}_{\mathrm{ZF}})^{-1} \mathbf{G}^{\mathrm{T}} \mathbf{y}, \tag{12}$$

and

$$\hat{\mathbf{s}}_{\text{MMSE}} = \mathbf{\Lambda}^{-1} (\mathbf{X} + \mathbf{Q}_{\text{MMSE}})^{-1} \mathbf{G}^{\text{T}} \mathbf{y}.$$
 (13)



Fig. 1. The proposed MCA-based detector circuit.

## B. Proposed MCA-Based Circuit

The proposed detector circuit is illustrated in Fig. 1, which comprises an MCA-based computing module and 2K amplifier circuits. The MCA-based computing module consists of five MCAs, two sets of analog inverters, a set of voltage followers and a set of OAs.

Owing to the virtual ground property of OA networks, the voltages at the inverting-input nodes of the set of OAs are approximately zeros. Let  $\mathbf{A}$ ,  $\mathbf{B}$ ,  $\mathbf{C}$ ,  $\mathbf{D}$  and  $\mathbf{E}$  be the conductance matrices of the five MCAs. According to Ohm's law and Kirchhoff's law, the input voltages  $\mathbf{v}_{\mathrm{in}}$  and the currents  $\mathbf{i}_1$  in Fig. 1 satisfy:

$$\mathbf{i}_1 = (\mathbf{A} - \mathbf{B})\mathbf{v}_{\text{in}}.\tag{14}$$

A voltage follower has a unity-gain. Therefore, let  $\mathbf{v}_1$  be the output voltages of the set of OAs, we have:

$$(\mathbf{C} + \mathbf{D} - \mathbf{E})\mathbf{v}_1 + \mathbf{i}_1 = \mathbf{i}^-, \tag{15}$$

where i<sup>-</sup> denotes the currents flowing into the inverting-input nodes of OAs. Since i<sup>-</sup> is approximately zeros owing to the inherent characteristic of OAs, we have

$$\mathbf{v}_1 = -(\mathbf{C} + \mathbf{D} - \mathbf{E})^{-1} \mathbf{i}_1. \tag{16}$$

The stability of the output voltages requires that the signs of the diagonal elements of  $C^{-1}$  are all positive [4], which is always valid, since C is a diagonal matrix with positive diagonal elements.

In the amplifier circuits, the conductance values of the memristive devices connected to the output nodes of the MCA-based computing module are all  $\theta_0$ . Let  $\theta_1, \theta_2, \dots, \theta_{2K}$  be the conductance values of the feedback memristive devices, respectively. The output voltages of the amplifier circuits are:

$$\mathbf{v}_{\text{out}} = -\mathbf{\Theta}^{-1}\mathbf{v}_{1},\tag{17}$$

where  $\Theta = \text{diag}(\frac{\theta_1}{\theta_0}, \frac{\theta_2}{\theta_0}, \cdots, \frac{\theta_{2K}}{\theta_0})$ . Upon substituting (14) and (16) into (17), we obtain:

$$\mathbf{v}_{\text{out}} = \mathbf{\Theta}^{-1} (\mathbf{C} + \mathbf{D} - \mathbf{E})^{-1} (\mathbf{A} - \mathbf{B}) \mathbf{v}_{\text{in}}. \tag{18}$$

The conductance value of a memristive device can be changed by charge or flux through it. Therefore, the conductance value of a memristive device can be set to any desired value within a specified range by a dedicated program [10], [11]. By mapping  $\mathbf{y}$  onto  $\mathbf{v}_{\rm in}$ , mapping  $\mathbf{G}^{\rm T}$  onto  $\mathbf{A} - \mathbf{B}$ , mapping  $\mathbf{Q}_{\rm ZF}$  or  $\mathbf{Q}_{\rm MMSE}$  onto  $\mathbf{C}$ , mapping  $\mathbf{X}$  onto  $\mathbf{D} - \mathbf{E}$  and mapping  $\mathbf{\Lambda}$  onto  $\mathbf{\Theta}$ , the result of (12) or (13), i.e.,  $\hat{\mathbf{s}}_{\rm ZF}$  or  $\hat{\mathbf{s}}_{\rm MMSE}$ , can be obtained by measuring  $\mathbf{v}_{\rm out}$ .

### IV. CONDUCTANCE MAPPING SCHEMES

We map a matrix that contains both negative and positive elements onto the difference between two positive conductance matrices, rather than a single one, in order to align with physical constraint. Let the conductance range of memristive devices be  $[\omega_{\min}, \ \omega_{\max}]$ . Let U be the mapped matrix and let A and B be the two conductance matrices. The scheme for mapping U onto A-B is:

$$a_{i,j} = \begin{cases} \omega_{\max}, u_{i,j} > 0\\ \omega_{\min}, u_{i,j} \le 0 \end{cases}$$

$$(19)$$

and

$$b_{i,j} = a_{i,j} - \alpha u_{i,j}, \tag{20}$$

where  $\alpha$  is called the mapping factor. The conductance values exceeding the permissible range will be truncated to the limits.

The actual conductance value of a memristive device typically deviates from the ideal value, with these deviations being modeled as zero-mean Gaussian random variables with a variance of  $\sigma_m^2$  [12]. In this section, we give two conductance mapping schemes, namely, the fixed mapping factor (FMF) scheme and the adjustable mapping factor (AMF) scheme.

# A. FMF Scheme

The core concept of the FMF scheme is to select a fixed mapping factor based on the probability distribution of elements of the mapped matrix. To map a matrix **U** onto conductance matrices, the FMF scheme calculates the mapping factor by:

$$\alpha = \frac{\omega}{\beta \sigma_u},\tag{21}$$

where  $\omega = \omega_{\text{max}} - \omega_{\text{min}}$ ,  $\beta$  is a parameter of the FMF scheme and  $\sigma_u$  is the standard deviation of the elements of U.

# B. AMF Scheme

The AMF scheme calculates the mapping factor by:

$$\alpha = \frac{\omega}{\max\{|u_{i,j}|\}},\tag{22}$$

to map U onto conductance matrices.

## V. SIMULATIONS

We consider a  $4\times64$  massive MIMO system, employing 64 quadrature amplitude modulation (QAM). We consider the memristive devices whose conductance range is  $0.1\sim30\,\mu\text{S}$ . The conventional MCA-based detection scheme computes  $\hat{\mathbf{s}}_{\mathrm{ZF}}$  or  $\hat{\mathbf{s}}_{\mathrm{MMSE}}$  based on (6) or (7). Therefore, the MCA-based computing module in Fig. 1 is employed as a conventional MCA-based detector circuit in our experiments. We perform SPICE simulations with the aid of LTspice<sup>®</sup>.

### A. Computation Time

We gauge the computation time of an MCA-based detector circuit by its convergence time, which is mainly influenced by the gain-bandwidth product (GBP) of OAs [13]. The output voltage waveforms of the proposed detector circuit and the conventional MCA-based detector circuit are shown in Fig. 2, and the OAs are assumed to have a GBP of 500 MHz. The convergence time of the proposed circuit is about 110 ns, exhibiting almost no difference compared with that of the conventional MCA-based detector circuit, and can be further enhanced by increasing the GBP of OAs.

## B. Detection Performance

In this subsection, we first investigate the impacts of the OLG of OAs, conductance mapping scheme and conductance deviations on detection performance of the proposed detector circuit, and then demonstrate the performance superiority of the proposed detector circuit over the conventional MCA-based detector circuit. We do not distinguish between the ZF and MMSE algorithms in the figures due to the absence of observable disparity in their detection performances in the considered scenario.

The computational accuracy of an MCA-based detector circuit is significantly constrained by the OLG of OAs. Fig. 3 shows the bit error rate (BER) results as the functions of the signal-to-noise ratio (SNR) for the proposed detector circuit, given various values of the OLG of OAs with  $\sigma_m=0$  and adopting the AMF scheme, in comparison with the BER of the digital benchmark. When the OLG of OAs is too low, the detection performance is poor. The OLG of OAs needs to be at least  $80\,\mathrm{dB}$  for the proposed detector circuit to ensure satisfactory performance, i.e., achieving the performance of the digital benchmark. In the rest of this subsection, we assume that the OLG of OAs is sufficiently large.

Fig. 4 shows the BER results of the proposed detector circuit adopting the FMF scheme under different  $\beta$  values with



Fig. 2. Waveforms of output voltages: (a) the proposed detector circuit, and (b) the conventional MCA-based detector circuit.



Fig. 3. BERs of the proposed detector circuit under different values of the OLG of OAs when  $\sigma_m=0$  and adopting the AMF scheme.



Fig. 4. BERs of the proposed detector circuit under different values of  $\beta$  when  $\sigma_m=0$  and adopting the FMF scheme.

 $\sigma_m=0$ , again using the digital approach as the benchmark. Obviously, when  $\sigma_m=0$ , a higher value of  $\beta$  results in fewer elements being truncated, and thus results in a lower BER and closer performance to the digital approach for the proposed detector circuit. Although the truncated elements degrade the detection performance, such an impact is mainly noticeable in high SNR. In low SNR, however, the primary constraint on detection performance remains the AWGN. Even without AWGN, detection errors still occur due to the truncated elements, causing the BER to gradually converge to a fixed value as the SNR increases.

Fig. 5 shows the BER results of the proposed detector circuit, with  $\sigma_m=1\%\omega$ . Obviously, the BER decreases and then increases as  $\beta$  increases, because the primary factor constraining detection performance shifts from the truncated elements to the perturbations caused by conductance deviations as  $\beta$  increases.

To facilitate the observation of the performance differences among the detector circuits with similar BERs, we use the normalized mean squared error (NMSE) of the computational results relative to the transmitted signals to measure



Fig. 5. BERs of the proposed detector circuit with  $\sigma_m = 1\%\omega$ .

the detection performance of a detector circuit. In order to visualize the impact of conductance mapping scheme and conductance deviations on the detection performance of the proposed detector circuit, Fig. 6 depicts the NMSEs of the computational results of the proposed detector circuit and the conventional detector circuit as the functions of both conductance deviation level and  $\beta$ , given an SNR of 20 dB and using the digital approach as the benchmark. For the FMF scheme, in the absence of conductance deviation, the larger the parameter  $\beta$ , the smaller the NMSE, while in the presence of conductance deviations, the NMSE first decreases and then increases as  $\beta$  increases, which confirms the trend observed in Fig. 5. Besides, only when the conductance deviation level is high and an appropriate  $\beta$  is selected, the performance of the FMF scheme surpasses that of the AMF scheme, otherwise it is worse than that of the AMF scheme.

In practical scenarios, UTs in a cell are usually located at different positions, the disparity in LSFCs associated with UTs results in great differences in the variances of the different elements of the matrices computed in the conventional



Fig. 6. NMSEs of the computational results relative to the transmitted signals of the proposed MCA-based detector circuit, adopting the FMF scheme and the AMF scheme, given an SNR of 20 dB and using the digital approach as the benchmark.



Fig. 7. BERs of the proposed detector circuit and the conventional detector circuit, varying with  $\beta$  value in a massive MIMO cell, with  $\sigma_m=0.5\%\omega$ .

MCA-based detector circuit, and the perturbations caused by conductance deviations are particularly severe to the elements with smaller variance. For the proposed detector circuit, the elements of **G** obey the same distribution, although the diagonal and nondiagonal elements of **X** follow different distributions, their variance disparity is not significant. Consequently, the impact of conductance deviations on the performance of the proposed detector circuit is relatively minor compared to that on the conventional MCA-based detector circuit.

To showcase the performance superiority of our proposed circuit over the conventional MCA-based detector circuit, we consider a multi-user massive MIMO cell whose radius is 150 m. UTs randomly appear within the cell, with each UT has a transmitting power of 20 dBm. The carrier frequency of uplink signals is 2 GHz with a bandwidth of 25 MHz. Fig. 7 shows the BER results of the proposed detector circuit and the conventional MCA-based detector circuit varying with  $\beta$  value, with  $\sigma_m=0.5\%\omega$ . Consequently, regardless of whether the AMF or FMF mapping scheme is employed, the proposed detector circuit consistently achieves a notably lower BER compared to the conventional MCA-based detector circuit.

## C. Power Consumption and Energy Efficiency

In this subsection, we consider the OA of [14]. Digital-to-analog converters (DACs) of [15] and analog-to-digital converters (ADCs) of [16] are employed in our experiments to supply input voltages to the circuits and measure output voltages of the circuits, respectively.

Fig. 8 shows the power consumption results of the proposed circuit and the conventional MCA-based detector circuit varying with K. Fig. 8 also shows the relative additional power consumption of the proposed circuit, i.e., the ratio of the power consumption of the introduced amplifier circuits to that of the conventional MCA-based detector circuit, which is less than 1%. Evidently, the introduced amplifier circuits do not lead to a noticeable increase in power consumption.

The energy efficiency of an MCA-based detector circuit can be gauged by the ratio of its equivalent floating-point



Fig. 8. Power consumption results of the proposed circuit and the conventional MCA-based detector circuit, as well as the relative additional power consumption of the proposed circuit, varying with the number of UTs, K.



Fig. 9. Energy efficiency results of the proposed detector circuit, the conventional MCA-based detector circuit and the commercial GPU *NVIDIA QUADRO GV100*, varying with the number of UTs, *K*.

operation (FLOP) number to the energy consumed during its computation time, which is measured in tera-FLOPs per second per watt (TOPS/W) in this paper. It is noted that either a real multiplication or a real summation is considered as a FLOP. Fig. 9 shows the energy efficiency results of the proposed detector circuit, the conventional MCA-based detector circuit and the commercial graphic processing unit (GPU) NVIDIA QUADRO GV100 [17]. The energy efficiency of the proposed circuit is almost identical to that of the conventional MCA-based detector circuit. As the number of UTs increases, the energy efficiency of the MCA-based detector circuits also increases, which is several orders of magnitude higher than that of the GPU NVIDIA QUADRO GV100.

#### VI. CONCLUSIONS

In this paper, we have proposed an MCA-based detector circuit, which can be employed to compute massive MIMO ZF and MMSE algorithms. In contrast to all existing MCA-based detector circuits, our proposed detector circuit comprises an

MCA-based matrix computing module, utilized for processing the SSFC matrix, and OA-based amplifier circuits, utilized for processing the LSFC matrix, and thereby achieves high robustness against conductance deviations of the memristive devices. We have investigated the impacts of the OLG of OAs, conductance mapping scheme, and conductance deviation level on detection performance of the proposed detector circuit. The proposed detector circuit exhibits significant performance superiority compared to the conventional MCA-based detector circuit, only incurring a negligible additional cost of power consumption. Moreover, the energy efficiency of our proposed circuit is several orders of magnitude higher than that of the commercial GPU NVIDIA QUADRO GV100.

#### REFERENCES

- C. -X. Wang, et al., "On the road to 6G: Visions, requirements, key technologies, and testbeds," *IEEE Commun. Surveys Tuts.*, vol. 25, no. 2, pp. 905–974, 2nd Quart. 2023.
- [2] S. Yang and L. Hanzo, "Fifty years of MIMO detection: The road to large-scale MIMOs," *IEEE Commun. Surveys Tuts.*, vol. 17, no. 4, pp. 1941–1988, 4th Quart. 2015.
- [3] E. P. -B. Quesada, et al., "Experimental assessment of multilevel RRAM-based vector-matrix multiplication operations for in-memory computing," *IEEE Trans. Electron Devices*, vol. 70, no. 4, pp. 2009– 2014, Apr. 2023.
- [4] Z. Sun, et al., "Solving matrix equations in one step with cross-point resistive arrays," Proc. Nat. Acad. Sci, vol. 116, no. 10, pp. 4123–4128, Mar. 2019.
- [5] Z. Sun, G. Pedretti, A. Bricalli, and D. Ielmini, "One-step regression and classification with cross-point resistive memory arrays," *Sci. Adv.*, vol. 6, no. 5, Jan. 2020, Art. no. eaay2378.
- [6] G. Yuan, et al., "Memristor crossbar-based ultra-efficient next-generation baseband processors," in Proc. IEEE 60th International Midwest Symposium on Circuits and Systems (MWSCAS), Boston, MA, USA, Aug. 6-9, 2017, pp. 1121–1124.
- [7] P. Zuo, Z. Sun, and R. Huang, "Extremely-fast, energy-efficient massive MIMO precoding with analog RRAM matrix computing," *IEEE Trans. Circuits Syst. II, Exp. Briefs*, vol. 70, no. 7, pp. 2335–2339, Jul. 2023.
- [8] P. Mannocci, E. Melacarne, and D. Ielmini, "An analogue in-memory ridge regression circuit with application to massive MIMO acceleration," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 12, no. 4, pp. 952–962, Dec. 2022.
- [9] Q. Zeng, et al., "Realizing in-memory baseband processing for ultrafast and energy-efficient 6G," *IEEE Internet Things J.*, vol. 11, no. 3, pp. 5169–5183, Feb. 2024.
- [10] C. Li, et al., "Analogue signal and image processing with large memristor crossbars," Nature Electron., vol. 1, no. 1, pp. 52–59, Jan. 2018.
- [11] L. Gao, et al., "Fully parallel write/read in resistive synaptic array for accelerating on-chip learning," *Nanotechnology*, vol. 26, no. 45, Oct. 2015, Art. no. 455204.
- [12] T. P. Xiao, et al., "On the accuracy of analog neural network inference accelerators," *IEEE Circuits Syst. Mag.*, vol. 22, no. 4, pp. 26–48, 4th Quart. 2022.
- [13] P. Mannocci, et al., "A universal, analog, in-memory computing primitive for linear algebra using memristors," IEEE Trans. Circuits Syst. I, Reg. Papers, vol. 68, no. 12, pp. 4889–4899, Dec. 2021.
- [14] B. Feinberg, et al., "An analog preconditioner for solving linear systems," in Proc. IEEE International Symposium on High-Performance Computer Architecture (HPCA), Seoul, Korea, Feb. 27-Mar. 3, 2021, pp. 761–774.
- [15] D. Przyborowski and M. Idzik, "A 10-bit low-power small-area high-swing CMOS DAC," *IEEE Trans. Nuclear Science*, vol. 57, no. 1, pp. 292–299, Feb. 2010.
- [16] M. J. Marinella, et al., "Multiscale co-design analysis of energy, latency, area, and accuracy of a ReRAM analog neural training accelerator," *IEEE J. Emerg. Sel. Topics Circuits Syst.*, vol. 8, no. 1, pp. 86–101, Mar. 2018.
- [17] "Data sheet: Quadro GV100," NVIDIA, 2022. [Online]. Available: https://www.nvidia.com/en-us/design-visualization/quadro/