

TOPICAL REVIEW

## Nanomagnet logic: progress toward system-level integration

To cite this article: M T Niemier *et al* 2011 *J. Phys.: Condens. Matter* **23** 493202

View the [article online](#) for updates and enhancements.

### You may also like

- [Bilateral comparison of 10 V standards between the NML \(Ireland\) and the BIPM, May to June 2008 \(part of the ongoing BIPM key comparison BIPM.EM-K11.b\)](#)  
O Power, J Murray, S Solve et al.
- [Bilateral comparison of 10 V standards between the NML \(Ireland\) and the BIPM, April to May 2009 \(part of the ongoing BIPM key comparison BIPM.EM-K11.b\)](#)  
O Power, S Solve and R Chayramy
- [Reliability analysis of magnetic logic interconnect wire subjected to magnet edge imperfections](#)  
Bin Zhang, Xiaokuo Yang, Jiahao Liu et al.

## TOPICAL REVIEW

# Nanomagnet logic: progress toward system-level integration

M T Niemier<sup>1</sup>, G H Bernstein<sup>2</sup>, G Csaba<sup>2</sup>, A Dingler<sup>1</sup>, X S Hu<sup>1</sup>, S Kurtz<sup>1</sup>, S Liu<sup>1</sup>, J Nahas<sup>1</sup>, W Porod<sup>2</sup>, M Siddiq<sup>2</sup> and E Varga<sup>2</sup>

<sup>1</sup> Computer Science and Engineering, University of Notre Dame, Notre Dame, IN 46556, USA

<sup>2</sup> Electrical Engineering, University of Notre Dame, Notre Dame, IN 46556, USA

E-mail: [mniemier@nd.edu](mailto:mniemier@nd.edu)

Received 14 February 2011, in final form 18 September 2011

Published 25 November 2011

Online at [stacks.iop.org/JPhysCM/23/493202](http://stacks.iop.org/JPhysCM/23/493202)

## Abstract

Quoting the International Technology Roadmap for Semiconductors (ITRS) 2009 Emerging Research Devices section, ‘Nanomagnetic logic (NML) has potential advantages relative to CMOS of being non-volatile, dense, low-power, and radiation-hard. Such magnetic elements are compatible with MRAM technology, which can provide input–output interfaces.

Compatibility with MRAM also promises a natural integration of memory and logic.

Nanomagnetic logic also appears to be scalable to the ultimate limit of using individual atomic spins.’ This article reviews progress toward complete and reliable NML systems. More specifically, we (i) review experimental progress toward fundamental characteristics a device must possess if it is to be used in a digital system, (ii) consider how the NML design space may impact the system-level energy (especially when considering the clock needed to drive a computation), (iii) explain—using both the NML design space and a discussion of clocking as context—how reliable circuit operation may be achieved, (iv) highlight experimental efforts regarding CMOS friendly clock structures for NML systems, (v) explain how electrical I/O could be achieved, and (vi) conclude with a brief discussion of suitable architectures for this technology. Throughout the article, we attempt to identify important areas for future work.

(Some figures may appear in colour only in the online journal)

---

## Contents

|                                               |    |                                          |    |
|-----------------------------------------------|----|------------------------------------------|----|
| 1. Introduction                               | 2  | 4.3. Local interconnect                  | 11 |
| 2. Background                                 | 2  | 5. Defects, errors, and robust operation | 14 |
| 2.1. Functionally complete logic sets         | 3  | 5.1. Thermal noise                       | 14 |
| 2.2. Non-linear response characteristics      | 3  | 5.2. Field misalignment                  | 21 |
| 2.3. Concatenability                          | 3  | 5.3. Misshapeness                        | 22 |
| 2.4. Unidirectional dataflow                  | 4  | 5.4. Scalability                         | 22 |
| 2.5. Gain                                     | 5  | 6. Clocking                              | 23 |
| 3. Related work                               | 6  | 6.1. Line clocking: concept              | 23 |
| 4. The NML design space                       | 7  | 6.2. Line clocking: experiments          | 24 |
| 4.1. Design space for reducing clock overhead | 7  | 6.3. Line clocking: extensibility        | 25 |
| 4.2. Shape for logic                          | 10 | 6.4. Line clocking: clock wire layouts   | 26 |
|                                               |    | 6.5. Clock driver and timing             | 28 |
|                                               |    | 6.6. Alternative approaches              | 28 |

|                                        |    |
|----------------------------------------|----|
| 7. Output                              | 29 |
| 7.1. MEI-1 and MEI-2                   | 29 |
| 7.2. Alternative MEI designs           | 30 |
| 8. Input                               | 30 |
| 8.1. MTJ or spin valve structure       | 30 |
| 8.2. Biasing line                      | 31 |
| 9. Uses: architectures and intangibles | 31 |
| 10. Wrap up                            | 32 |
| References                             | 32 |

## 1. Introduction

The focus of this work is on computation beyond the CMOS field effect transistor—which may ultimately be limited by physics, cost, and manufacturing-related issues. More specifically, we discuss computational systems where magnetic elements with nanometer feature sizes are used to both process and store binary information. Why is this good? In conventional electronics, most computation is charge-based. A power supply maintains state, and thousands of electrons (each at  $\sim 40$  kT) are needed to perform a single function. Alternatively, nanomagnet logic (NML) devices can process information in a cellular-automata like architecture, could dissipate  $<40$  kT per switching event for a logical gate operation<sup>3</sup> [1], will retain state without power, and are intrinsically radiation-hard. Thus, NML has the potential to mitigate increasing chip-level power densities that are currently exacerbated by device scaling, could help to improve battery life in mobile information processing systems, and may operate in environments where transistor-based logic and memory cannot.

In this paper, we discuss the state of the art, outstanding challenges—both at the device and architectural levels—and progress toward complete and reliable NML systems. The paper is organized as follows:

- In section 2, we review relevant background related to NML circuits and systems. More specifically, we consider NML in the context of the five fundamental ‘tenets’ [2] that must be satisfied if an emerging device technology is to form the basis of a digital system. We report—for the first time—that all five tenets have been experimentally demonstrated. Notably, we discuss how NML can enable a functionally complete logic set, and define the role of a system-level clock.
- In section 3, we review related work—specifically related approaches where single-domain magnetic islands and/or fringing field interactions are used to implement hardware that processes information.

<sup>3</sup> Note that the 40 kT number referenced here only accounts for the switching energy from the magnets themselves, and does not account for the energy associated with any drive circuitry (i.e. clock) that will be required to control the re-evaluation of an NML ensemble. When considering system-level benchmarking, the energy associated with any peripheral circuitry must be accounted for as well to properly compare NML to any other information processing technology. This is especially important as energy associated with the clock is expected to be greater than that associated with the collective magnet switching events. This will be considered further later in the paper.

- In section 4, we discuss the NML design space itself. More specifically—and largely independent of any specific clock implementation—we consider how device characteristics (size, shape, materials, etc) can impact circuit energy (section 4.1) and latency (section 4.2). We also consider possible mechanisms for connecting one logic block to another (section 4.3). Efficient local interconnect is especially important given that computation occurs via fringing field coupling between adjacent devices. Interconnect could increase the number of switching events in a given circuit’s critical path and adversely affect latency.
- Section 5 considers how resilient the structures discussed in sections 4.1–4.3 may be when considering device fabrication variation, thermal noise, etc. Scalability prospects are considered in section 5.4.
- Specific implementations of the clock needed to control an NML ensemble (for purposes of circuit re-evaluation) are presented in section 6. The primary focus of our work is the use of clad conducting lines (like word and bit lines in field MRAM circuits) to generate local magnetic fields, on-chip, that control a specific subset of devices in an NML ensemble. The concept is considered in section 6.1. Recent experimental results are discussed in section 6.2, and the extensibility of this approach is presented in sections 6.3–6.5. Alternative approaches to clocking NML circuits are also discussed in section 6.6.
- Mechanisms for reading the output of an NML ensemble—i.e. to transduce a signal to the electrical domain—are discussed in section 7. To date, the primary emphasis has been to use fringing fields from a nanomagnet to set the state of a free layer of an MTJ or spin valve. A read could then proceed as in a conventional MRAM memory array.
- In section 8, we discuss ways to set the state of the NML devices that serve as inputs to an ensemble. As for output, we discuss how a layered structure could be used for this purpose. The use of a current driven biasing line is also considered.
- In section 9, suitable architectures for NML are briefly presented.
- Section 10 concludes.

## 2. Background

Before commencing a discussion on the structures necessary for deployed, CMOS compatible, NML systems (i.e. CMOS compatible clocks, etc) we will first summarize work targeting the experimental demonstration of the five fundamental ‘tenets’ [2] that a device must satisfy if it is to be used to implement a digital system. In summary: (i) a device should enable a functionally complete logic set, (ii) a device should have non-linear response characteristics, (iii) the output of one device must drive another (i.e. with no state variable change), (iv) power amplification (or gain) is needed, and (v) dataflow directionality must be well defined. In 2008, only tenets (i), (ii), and (iii) had been realized for NML. At present, all five tenets have been experimentally demonstrated. More recently, the Notre Dame research group has designed



**Figure 1.** (a) SEM image of AF-ordered line; (b) MFM image of AF-line in logically correct state; (c) SEM image of majority voting gate.



**Figure 2.** (a) Hysteresis loops for different aspect ratio magnets; (b) majority gate design where top and bottom inputs are larger than the middle input; (c) when subjected to two fields of opposite sign and magnitude, a strong field sets the magnetization state of all inputs, while a weaker field only reverses the magnetization state of the shorter input magnet. Note that the gate is in a logically correct state. Also, more fine-grained quantitative data is presented in section 6.

and demonstrated the structures that explicitly address tenets (iv) and (v), and has also designed and demonstrated new structures to address tenet (i) that could be used to improve a given NML circuit's area and latency. Appropriate demonstrations for all five tenets are highlighted below.

## 2.1. Functionally complete logic sets

An NML ‘wire’ (figure 1(a)) can be formed from a line of magnets that couple with one another. Initial experiments with NML considered lines of circular magnetic disks and demonstrated that a magnetic soliton could propagate from one end of the line to the other [3]. This work was extended in [1, 4] and precipitated experimental results like that illustrated in figures 1(a) and (b)—a line of magnets with perfect, antiferromagnetic (AF) ordering. (Per [5], ferromagnetically ordered lines of magnets also represent a logically correct ground state.) Boolean functions can be realized with combinations of majority voting gates (figure 1(c))—where a gate output is the logical value associated with the majority of three input magnets (logic function:  $o = i_1 i_2 + i_2 i_3 + i_1 i_3$ ).<sup>4</sup> By setting one input to a logic ‘0’ or ‘1,’ the gate will execute an AND or OR function respectively. AF-ordered lines of magnets and magnetic majority gates have all been experimentally demonstrated at room temperature [1, 4]. Finally, as signal inversion can be accomplished by obtaining a copy of a signal from an AF-ordered line with an even number of magnets (like that illustrated in figure 1(a)), this technology can support a functionally complete logic set.

## 2.2. Non-linear response characteristics

With nanomagnet logic, the magnetization state of a block of magnetic material is used to represent a logic ‘1’ or

logic ‘0.’ The energy difference between magnetization (binary) states in an NML device can be hundreds of  $kT$  at room temperature, and is a function of magnet size, shape, and material [1]. Magnetization state transitions are highly non-linear as demonstrated both by the simulation results in figure 2(a) (hysteresis loops for  $60 \times 240 \times 30 \text{ nm}^3$  and  $60 \times 130 \times 30 \text{ nm}^3$  supermalloy magnets), as well as experimental evidence. Following an approach suggested in [6], we have fabricated majority gates where two inputs are longer than a third (figure 2(b)). When subjected to two magnetic fields (of opposite sign and magnitude), a strong field sets the magnetization state of all inputs, while a weaker field only reverses the magnetization state of the shorter magnet (figure 2(c)).

## 2.3. Concatenability

Experiments to be discussed within this document routinely suggest that fringing fields from individual NML devices can bias other devices into new, logically correct states. (Thus, information is transferred via the same state variable, and two devices can be ‘concatenated.’). However, as was mentioned above, the energy difference between magnetization states can be quite large, and some mechanism (a clock) is needed to modulate the energy barrier between magnetization states. (See the ‘No Clock’ curve for a  $60 \times 90 \times 30 \text{ nm}^3$  supermalloy in figure 3(a).) One mechanism for modulation is to employ an auxiliary magnetic field that is directed along the hard (or short) axes of the devices in an NML ensemble.<sup>5</sup> This will lower the initial energy barrier between ground states (hard-axis-directed flux from the magnets themselves also helps). As magnets are hard-axis-biased against a preferred shape anisotropy (figure 3(a)-(i)), fringing fields from a

<sup>4</sup> In section 4.2, more efficient gate designs will also be discussed.

<sup>5</sup> Other approaches to clocking—including the use of multiferroic materials—will be explained in section 6.6.



**Figure 3.** (a) (i) Energy landscape (generated via micromagnetic simulation) when a  $60 \times 90 \times 30 \text{ nm}^3$  supermalloy device is subjected to a magnetic field that is representative of the clock field and fringing fields from two hard-axis-biased, neighbor devices; (ii) energy minimum if one neighbor provides a small  $y$ -bias (to facilitate logical state transition); (iii) a magnetic island whose easy axis is parallel to that of the applied clock field can mimic a hard-axis-biased neighbor and helps to ensure that select magnets in a circuit ensemble remain in a low energy state until the desired neighbor switches it; this helps to prevent an ensemble (e.g. an AF-ordered line) from being driven in multiple directions; (iv) given the influence of just one neighboring device (and no block), a magnet at the end of a line like that in figure 2(a)-(i) is in a high energy state when hard-axis-biased (and fabrication variation, thermal noise, etc could instigate unwanted switching). (b) With insufficient  $y$ -bias, device will not change state.

neighboring magnet can then bias a device into a new binary state (figure 3(a)-(ii)). As the hard axis field is removed, a device's  $y$ -component of magnetization increases.

The magnitude of the fringing fields required to facilitate a state transition can be captured quantitatively by generating an  $M$ - $H$  curve for a given device—but with the addition of a hard-axis-biasing field (similar to Stoner–Wohlfarth [7] MRAM switching). For example, referring to figure 3(b), if we assume a hard-axis-biasing field from one adjacent magnet and the clock, a  $y$ -bias of approximately 23 mT is required to induce an  $M_y$  sign (binary state) transition. If the neighbor magnet cannot produce this bias, the last magnet in the clocked group of magnets will retain its old state (see light curve and inset in figure 3(b)), drive the ensemble from the opposite direction, and a metastable state (or ‘stuck at’ fault) could ensue [8].

Proper energy barrier modulation is also important to satisfy other ‘tenets.’ Clock structures—that are physically realizable—can provide the required functionality. Both items will be discussed in more detail in sections 2.4 and 6.

#### 2.4. Unidirectional dataflow

By properly controlling the energy landscape of certain devices in an ensemble, we can ensure that data flows unidirectionally through the ensemble. Using an AF-ordered line as an example, we want to ensure that a device that is switching to a new (binary) state sees easy-axis-directed fringing fields from only one driver magnet. A neighboring device should be in a nulled/neutral state. To illustrate, the energy minimum of magnet ‘\*’ in figure 3(a) should remain at  $0^\circ$  (due to the influence of the clock and neighboring, hard-axis-biased magnets) until  $y$ -directed flux from one magnet induces a state transition (figure 3(a)-(i), (ii)). In experiments to be discussed, blocks of magnetic material with easy axes parallel to the clock field are used to ensure that the lowest energy state for select devices in an NML ensemble remains at  $0^\circ$  until influenced by the proper device. For example, blocks

might be placed adjacent to the last magnet in an AF-ordered line. (Note the similarities between figure 3(a)-i and (a)-iii—in both instances, the lowest energy state is at  $0^\circ$ .)

As a specific example, in figure 4(a), we consider five-magnet, AF-ordered lines (with blocks at the end and input drivers that map to logic ‘1’ and logic ‘0’ states) that settle into logically correct states after being subjected to a unidirectional clocking field. Micromagnetic simulations performed with the object oriented micromagnetic framework (OOMMF) suite developed by NIST [9] (a Landau–Lifschitz ODE solver) support block utility. Note that an OOMMF extension developed by the University of Hamburg (ThetaEvolve) [10] was used to mimic temperature-induced stochastic effects.

Figure 4(b) illustrates how the magnetization state of each device in the five-magnet line (without a terminating block) changes in time. While the final state of the magnets in the line suggest that the structure is in a logically correct state after a unidirectional field was applied and removed (see inset), in actuality, the last magnet switches before its neighbor (see figure 4(b)). Because the boundary magnet (e.g. magnet ‘#’ in figure 3(a)-(iv)) is not coupled to another device, it is in a higher energy state when hard-axis-biased (it only sees  $x$ -directed flux from the clock and *one* neighbor). As such, fabrication variations, thermal noise, etc can cause the last device to switch before being ‘set’ by fringing fields from the neighboring device. Thus, the line is driven from two directions<sup>6</sup>.

Simulations of the same ensemble with a block suggest that the time evolution of the devices in the line is correct (figure 4(c)—especially important as magnetic force microscopy (MFM) provides only a ‘before and after’

<sup>6</sup> Again, while the final magnetization state of the line is correct, the last magnet in the line could just as easily relax such that the sign of the  $y$ -component of magnetization is positive. Per [38], thermal fluctuations could help to ‘fix’ the metastable state described by this situation, but computation would presumably be both slower and unreliable.



**Figure 4.** (a) (i) SEM of AF-ordered line segments terminated with a block (as advocated by figure 3(a)-(iii)), ((ii), (iii)) MFM images which show the line segment in a logically correct state after being subjected to an externally applied clocking field; (b) micromagnetic simulations representative of the structure in figure 4(a) indicate that if boundary is unmanaged, while final magnetization state may appear correct (see inset) the time evolution is incorrect—as the last, uncoupled magnet in the line switches before its neighbor drives it—and the line itself is driven from two directions; (c) micromagnetic simulations support block utility; devices switch in the proper order, and to a logically correct magnetization state.



**Figure 5.** (a) Experimental demonstration of an NML ensemble that serves as a fanout structure; the ensemble is evaluated with a logic ‘0’ input; (b), (c) how the rotating field modulates energy barrier—oscillation between minima (b), and settling around minimum (c).

snapshot. Magnet shape engineering and multi-phase clocking can both be employed to ensure that this tenet is satisfied when assuming the CMOS compatible clock structures to be discussed in section 6 [11].

## 2.5. Gain

Reference [12] reports the demonstration of 1:3 fanout (figure 5(a)) with gain provided by a magnetic clocking field.

Note that a rotating field clock was used for this experiment (see [12, 13]), which is less sensitive to field alignment issues (see section 5.2) as clocking is not done on-chip. More specifically, in all structures, magnets are supermalloy, approximately  $60 \times 90 \times 30 \text{ nm}^3$  in size, spaced approximately 20 nm apart, and were fabricated with electron-beam pattern definition, e-beam evaporation, and liftoff.

To apply a rotating field to an NML ensemble, a sample can simply rotate between two poles of an electromagnet. Designs for generating (or mimicking) a rotating field on-chip have been suggested [14, 15]. If the amplitude of the applied field is sufficiently high, as the sample rotates, devices will not settle into specific energy minima (even with a y-bias from another magnet)—i.e. to help enforce directionality. (Figure 5(b) illustrates the local minima associated with four different field directions.) As the applied field decays, and given a suitable amount of y-directed flux from a neighbor magnet, devices will settle into specific magnetization states (see figure 5(c) for snapshots assuming a positive y-bias and a lower amplitude applied field).

As seen in figure 5(a)-(i), a signal is driven through an antiferromagnetically ordered line, to a ferromagnetically ordered line, and then to three different antiferromagnetically ordered lines. The clock provides the energy needed for switching; flux from the magnets themselves should determine the final state of the ensemble. An MFM image in figure 5(a)-(ii) shows a design in figure 5(a)-(i) in a logically correct state after a binary 0 input was applied. Other fanout structures—with output lines of varying lengths—have also been fabricated and function correctly [12, 13].

Note that a similar NML structure can also be used for fanout if a unidirectional clock is used. Micromagnetic OOMMF simulations (at 300 K) show that  $60 \times 90 \times 30 \text{ nm}^3$  supermalloy magnets switch in the proper order as a unidirectional field is applied and removed [13].

### 3. Related work

The ITRS defines NML as ‘logic based on physically coupled, single-domain magnets’ (i.e. logic via fringing field interactions). As such, the primary focus of this paper is on devices that move/process information via in-plane coupling—essentially the only approach where structures for (locally clocked) interconnect [16], (locally clocked) logic gates [16], fanout [12], etc have been experimentally demonstrated. However, before continuing, we briefly highlight related approaches where single-domain magnetic islands and/or fringing field interactions are used to implement hardware that processes information (i.e. per this definition, MTJ-based logic discussed in [17]—where MTJs serve as tunable resistors to implement dynamic current mode logic [18], domain wall logic [19], etc are not discussed).

Perhaps the most similar approach to that discussed above is work discussed in [20]—i.e. where every NML device is an MTJ structure, and where information would be moved and processed via fringing field interactions between MTJ free layers. This work was originally motivated by the need for an electrical output. The authors note that the extraordinary Hall

effect had been leveraged as an electrical read scheme [21], but that its extensibility into the sub-100 nm regime would prove difficult. An alternative read scheme would be to use fringing fields of an NML device to set the state of an MTJ’s free layer. This approach was first proposed in [22] and is discussed further in section 7. In [20], every NML device is an MTJ. In [20], no specific NML interconnect or logic structures were experimentally demonstrated. While a line of MTJs (similar to that in figure 1(a)) was fabricated, external fields applied along the easy axes of the devices. Rather, a main goal of this experiment was to consider at what device-to-device spacing hard axis, ferromagnetic ordering of the line occurred (i.e. where shape anisotropy was overcome and obtaining a suitable magnetoresistance would be come difficult).

Single-domain magnets can also be realized from cobalt–platinum (Co/Pt) multilayers, which exhibit perpendicular (i.e. out-of-plane) magnetization states. Co/Pt based magnets can be defined by focused ion-beam (FIB) lithography. A locally applied  $\text{Ga}^+$  ion dose changes the in-plane distribution of crystalline anisotropy. Sufficiently high doses render the material locally paramagnetic and can define dots with a single FIB lithography step. The perpendicular magnetization has architectural benefits as coupled NML devices can be arranged freely in plane, and neighboring dots are always AF-coupled. Local manipulation of magnetic properties by FIB, and changing the multilayer composition, enables a large design space. Co/Pt devices can be clocked in a similar fashion to permalloy-based NML devices and have similar performance characteristics [23, 24].

Magnetic interaction between single-domain Co nanomagnets is strong; patterned and demagnetized films show large AF-ordered regions [25]. FIB irradiation defines nucleation centers on the Co/Pt dots (i.e. it sensitizes it at well-defined locations). This can be used to achieve non-reciprocal dots. For example, a sensitized dot will couple more strongly to its left neighbor than its right [26]. Micromagnetic simulations indicate a several-fold difference. Thus, gate input and outputs are distinctly made, with the effect of the input influencing the gate in only one direction toward the output. This eliminates the need for field gradients and helper dots, and reduces the density of required clocking wires. The power dissipation of any clock structure and control electronics might then be amortized over a larger number of NML devices. Micromagnet Co/Pt dots could be placed above/below a permalloy domain wall conductor, and its strong, local magnetic field can penetrate the entire volume of the dots.

Recently, [27] has proposed reconfigurable arrays of magnetic automata (RAMA). With the RAMA approach, a matrix of magnetic nanopillars is coupled with a nanowire crossbar array (with one array of parallel wires situated below the pillar array, and orthogonal group of wires above the pillar array). If the ferromagnetic pillars are embedded in a ferroelectric or multiferroic matrix, an electric field from the nanowire crossbar structure can rotate the magnetization state of the pillar from out-of-plane to in-plane (and back). A group of four pillars (arranged such that they are the corners of a square) is similar to an electrostatic QCA cell—i.e. as

proposed in [28, 29]. Information would be moved/processed via interactions between ferromagnetic pillars. The nanowire crossbar array would be used for configuration, input, output, and array operation (i.e. no magnetic fields would be used for clocking). While [27] notes that CoFeO<sub>4</sub> pillars embedded in the multiferroic matrix BiFeO<sub>3</sub> can experience magnetization state rotation after being subjected to an electric field, no RAMA devices (per the definition above) have been fabricated.

In 2010, Behin-Aein, *et al* proposed the all spin logic (ASL) approach to information processing [30]. Nanomagnets serve as inputs and outputs to an ASL circuit, and all communication occurs via spin coherent channels. Information processing occurs via the superposition of spin currents from various inputs into a channel. Computational state is ‘latched’ by an output nanomagnet—where magnetization state is set via STT from the aforementioned spin current. An additional spin polarized current—that places the output magnet into a high energy/metastable state—can facilitate this switching process. (The additional current would be generated via a magnetic fixed layer orthogonal to that of the output magnet—i.e. a ‘free layer’.) To date, no ASL devices have been experimentally demonstrated. Per [30], challenges to the ASL approach include the fabrication of channels with reasonable spin coherence lengths, complex layered structures to define dataflow directionality, etc.

Finally, nanomagnets may also be used to ‘latch’ the output of a computation in the spin wave bus approach described in [31]. The use of spin waves for computation could allow for information processing systems where data is both moved and processed without any charge transfer. Moreover, multi-bit operations could be possible if different wave frequencies are used. The operation of a spin wave circuit is essentially comprised of three different steps: (i) input data is converted into spin waves, (ii) data is processed/computation occurs via spin wave propagation in wave guides and phase manipulation, and (iii) computational state is detected by converting wave information into an output voltage. A magnetoelectric (ME) cell [32] has been proposed to control spin wave signals with electric fields. More specifically, an ME cell might be realized with a piezoelectric–piezomagnetic material ‘sandwich.’ For example, [32] describes a ferroelectric (e.g. PZT) coupled to a ferromagnetic film (e.g. CoFe or NiFe). The ferroelectric is contacted by a metal gate, while the ferromagnetic film could sit atop a silicon substrate. If a voltage is applied to the gate, the stress produced by the piezoelectric can impact (i.e. rotate) the easy axis of the piezomagnetic material. Additional simulations suggest that the magnetization direction of the cell can be controlled by the phase of an incoming spin wave signal. Thus, the ME cell can preserve magnetization (e.g. state) provided the gate voltage is applied. Reference [33] describes recent experimental progress.

## 4. The NML design space

In section 2, framed by a discussion of five ‘generic’ characteristics a device must have for use in digital systems,

relevant background is presented regarding the experimental state of the art for NML. Notably, we explain how this technology can support a functionally complete logic set, and how a clock is used for circuit re-evaluation. Here, we consider the NML design space in more detail, other practical structures and design components needed for more complex systems, and ways to improve system energy and latency in NML ensembles.

### 4.1. Design space for reducing clock overhead

Low energy operation is a primary design target for NML systems, and the clock is expected to be the most significant component in any system-level energy budget [34]. Thus, one approach toward lower energy systems is to improve the clock circuitry itself. For example, if a current-carrying wire is used to generate a magnetic field for clocking, manipulating clock wire geometries and smart wire layouts could (respectively) reduce Ohmic losses and save voltage drops. Alternatively, using multiferroic materials for clocking [35] could eliminate current driven clocks altogether. These approaches will be considered in section 6. Here, we consider how the magnet ensemble itself could help reduce clock energy—e.g. by designing structures that require lower fields—and hence current—to facilitate an ensemble’s re-evaluation. (Note that much of this discussion assumes field driven clocking.)

A magnet’s aspect ratio (i.e. its height versus width), its thickness, the spacing between magnets, the material a magnet is made from, the material that surrounds NML devices (we will later consider the use of magnetic nanoparticles to increase local permeability as in [36]), and the shape of an NML device (e.g. a rounded rectangle or trapezoid) can all influence the magnitude of the external field required to facilitate switching. (See figure 6(a) for an illustration of some of the aforementioned design parameters.)

In the search for optimal circuit design parameters, there are two major considerations to take into account when considering how a magnet changes state. The first is the energy barrier separating states described above. Magnets with lower energy barriers should generally require lower external clock fields to switch. The second is the source of the magnetic fields that facilitate a magnetization state change. In a NML system, magnets experience external fields originating from a clock structure, as well as fields from neighboring magnets. This suggests that if we increase the field contribution from the magnets themselves, the required contribution from the clock wires can be reduced.

In this section, we show tradeoffs between different parts of this design space. Results are simulation based and, as before, were generated with OOMMF [37]. All of the potential design parameters noted above (and considered more quantitatively below in sections 4.1.1–4.1.3) require extensive study.

#### 4.1.1. Changing the magnet ensemble.

Below, we consider a set of simulations that illustrates how magnet spacing can have a significant impact on clock field requirements. More specifically, we consider lines of  $40 \times 60 \times 10 \text{ nm}^3$  and  $40 \times 60 \times 20 \text{ nm}^3$  supermalloy magnets with different



**Figure 6.** (a) Numerous design parameters can be leveraged to reduce clock dependence or compensate for manufacturing limitations; (b) magnet thickness and magnet spacing can impact the magnitude of the external clocking field required to hard-axis-bias an NML ensemble. Depending on magnet spacing, thinner or thicker devices may be hard-axis-biased with lower magnitude clock fields.

spacings between devices<sup>7</sup> and use OOMMF to determine the external field magnitude needed to hard-axis-bias (i.e.  $M_x = M_s$ —where  $M_s$  is the saturation magnetization—and  $M_y = 0$  per the coordinate system in figure 6(a)) all of the magnets in the line. This is an intermediate state required for re-evaluation as suggested by figure 3.

Figure 6(b) shows a representative result from our simulations, where an external field that continually increases in magnitude is applied to lines of the aforementioned magnets (that would have a natural AF-ordering). By examining the switching behavior of a magnet in the middle of the AF-ordered line, several potential consequences of scaling are observed. For example, increasing the spacing of the magnets from 8 to 16 nm more than doubles the required magnetic fields from the clock. This suggests that closer spacing can increase neighbor-to-neighbor interactions/magnetic fields and reduce clocking field requirements. However, changing magnet thickness can also impact clock field requirements. As seen in figure 6(b), if spacing between magnets is 8 nm, a line of thicker magnets can be hard-axis-biased with a lower magnitude clocking field. If the spacing is increased to 16 nm, lines of thinner magnets can be hard-axis-biased with lower magnitude clocking fields. While a thicker magnet has a larger energy barrier, it also produces stronger biasing fields on its neighbors. Thus at close spacing, where neighbor-to-neighbor interactions are stronger, the thicker magnets can be re-evaluated with a lower magnitude clocking field. However, as spacing increases, inter-device coupling is reduced and lines of thinner magnets (with smaller energy barriers) require lower external fields to re-evaluate<sup>8</sup>.

**4.1.2. Magnet shape.** In [38, 39], it has been suggested that changing magnet shape (e.g. making a device with a trapezoid shape instead of a rounded-rectangle shape) might

<sup>7</sup> Devices are otherwise similar—i.e. the same values for  $M_s$ , etc are assumed.

<sup>8</sup> A detailed discussion is beyond the scope of this paper—and a separate manuscript describing these trends in the face of scaling is being prepared—but generally trends do hold.

allow for circuit re-evaluation with lower magnitude clock fields. To determine whether or not improvements could in fact be realized, we also considered AF-ordered lines where devices were trapezoid shaped instead of rounded-rectangles as described above. In our simulations:

- The trapezoid devices had the same area footprint as a rounded-rectangle equivalent.
- Device aspect ratios were varied from 1.5 to 2.
- Device thickness was varied between 10 and 20 nm.
- The spacing between devices varied from 8- to 16 nm.
- The ratio of the mid-width to the end-width of the trapezoid was 2:3.

In all cases, we found that lines comprised of trapezoid devices required 25% lower clocking fields when compared to rounded-rectangle equivalents. While fabricating more complex shapes will present additional challenges, the aforementioned simulation results do indeed suggest that magnet shape could represent an important design lever for lowering clock energy.

A detailed study as to why one magnet shape may be more beneficial than another is well beyond the scope of this paper. Certainly, magnet shape has been recognized as an important design parameter in other application spaces. Notably (as representative examples), shape-dependent switching of magnetic tunnel junctions (MTJs) has been studied in [40], superparamagnetic islands were studied in [41], and asymmetric rings were considered in [42]. When considering the simulation results discussed above, one could argue that a lower aspect ratio device (with a lower energy barrier) has been created. Additionally, simulations presented in [38] suggest that a trapezoid shaped magnet may produce a higher magnitude y-bias on the device it is meant to drive. In this case, a lower magnitude, hard-axis-biasing field might be required [7]. Additional study is required in future work.

**4.1.3. Changing the material system.** The materials from which magnets are made, as well as their crystalline structure



**Figure 7.** Time evolution of seven magnets (labeled A–G from left-to-right) in an AF-ordered line that is initially in a logically correct state. In all cases, the input to the line is changed, and a hard-axis-directed clocking field is applied to the line. Magnets are assumed to be  $60 \times 90 \times 5 \text{ nm}^3$  and all simulations are performed assuming a temperature of 0 K. (a) When the line of cobalt dots with biaxial anisotropy ( $M_s = 1000 000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ,  $K_1 = 30 000$  per [43]) switches to a logically correct state, devices are hard-axis-biased sequentially and relax to a new, logically correct state in a sequential fashion as well. A field of at least 13 mT is required to hard-axis-bias each device in the line such that the line can be ‘re-evaluated’ given the new input magnetization state; (b) when the line of permalloy dots ( $M_s = 800 000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ) switches to a logically correct state, all devices become hard-axis-biased and switch simultaneously. A field magnitude of  $\sim 20$  mT is needed to facilitate re-evaluation; (c) the field applied to lines of cobalt–iron dots ( $M_s = 1700 000 \text{ A m}^{-1}$ ,  $\alpha = 0.007$ , exchange energy =  $2.8 \times 10^{-11} \text{ J m}^{-1}$ ) is never of sufficient magnitude to facilitate re-evaluation, and the line remains in its initial state. (A curve never crosses the x-axis where  $M_y = 0$ , which would indicate a state transition.)

can have a profound effect on their switching behavior, as well as the magnitude of the magnetic field required to induce and manage switching. To illustrate, we studied a 7-magnet, AF-ordered line via micromagnetic simulation. The line was initialized to a logically correct state. The state of the first (input) magnet to the line was then changed, and a magnetic field was then applied along the hard axes of the devices in the line (that increased from 0 mT to a peak value (either 15 or 20 mT), and then was ramped back down to 0 mT over the span of 16.4 ns). Of interest was the magnitude of the magnetic field required to re-evaluate the line. Three simulations were performed and are summarized in figure 7. In each simulation, we assumed that the magnets in the line were made of a different magnetic material.

More specifically, in all cases, magnets were  $60 \times 90 \times 5 \text{ nm}^3$ , and all simulations were performed for temperatures of 0 and 300 K. Lines of cobalt dots with a biaxial anisotropy<sup>9</sup> ( $M_s = 1000 000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ,  $K_1 = 30 000$  per [43]), permalloy dots ( $M_s = 800 000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$  [9]), and cobalt–iron dots ( $M_s = 1700 000 \text{ A m}^{-1}$ ,  $\alpha = 0.007$ , exchange energy =  $2.8 \times 10^{-11} \text{ J m}^{-1}$ ),

were considered. While the results of the 0 and 300 K simulations were often similar regarding the final magnetization states of the AF-ordered lines, the 0 K results appear in figure 7 as there is less noise during switching and the graphs are generally more readable.

When the line of permalloy magnets switches to a logically correct state, all devices essentially become hard-axis-biased and switch simultaneously. A field magnitude of 20 mT is needed to hard-axis-bias all of the devices in the line. This is determined by (a) examining when the plot of magnetization state versus time for the last magnet in the AF-ordered line changes sign, and (b) determining the magnitude of the applied field at that point (see curve plotted on second y-axes of the graphs in figure 7).<sup>10</sup> Alternatively, when the line of cobalt dots with biaxial anisotropy switches to a logically correct state, devices are hard-axis-biased, and relax to a new, logically correct state in a sequential fashion. A field of  $\sim 13$  mT is required to hard-axis-bias each device in the line such that the line can be ‘re-evaluated’ given the new input magnetization state. The field applied to lines of

<sup>9</sup> See expanded discussion of devices with biaxial anisotropy in section 5.

<sup>10</sup> Changing the temperature can result in (noisier) switching with a lower magnitude field. A detailed study of reliability should be the subject of future work.



**Figure 8.** Potential barriers for (a) symmetric and (b) asymmetric magnets; assumes external magnetic field is applied from left-to-right as indicated by arrow; (c) shape-based gate with slanted edge inputs; (d) SEM images of (i) shape-based AND gate and (ii) shape-based OR gate; MFM images of (iii) shape-based AND gate and (iv) shape-based OR gate.

cobalt–iron dots is never of sufficient magnitude to facilitate re-evaluation, and the line remains in its initial state.

**4.1.4. Summary.** A detailed discussion of the NML design space is beyond the scope of this paper. Rather, the examples presented in sections 4.1.1–4.1.3 are simply meant to (i) highlight the complex tradeoffs in the NML design space, and (ii) to illustrate how some design parameters may compensate for others if one is more difficult from the standpoint of fabrication. For example, materials with a higher  $M_s$  value can be stronger drivers of a neighboring device—but may also result in structures that require higher field magnitudes (and hence energy) to clock. Alternatively, magnet shape (or thickness) may provide other design levers such that a given device could have a stronger influence on the neighbor that it is meant to drive (i.e. in lieu of a material with a higher  $M_s$  value). Determining the ideal properties of the devices that might comprise an NML circuit ensemble must be the subject of much future work.

#### 4.2. Shape for logic

Above, we illustrated how magnet shape could play a role in reducing the magnitude of the magnetic clocking field required to re-evaluate an AF-ordered line given a change in magnetization state at the line’s input. Here, we illustrated how magnet shape could also lead to reduced footprint, and lower latency logic.

**4.2.1. Energy barrier shifts.** For a magnet with a symmetric, rounded-rectangle shape, a given device will be in the highest energy state if magnetized along its hard axis (see figure 8(a)). However, lower energy states are equivalent and occur at  $\pm 90^\circ$ . (With no external stimulus to keep such a device

in a hard-axis-biased state, thermal noise, minor fabrication variations, etc should determine whether or not it relaxes such that its  $y$ -component of magnetization is positive ( $\uparrow$ ) or negative ( $\downarrow$ ). By changing a magnet’s geometry, we also change its energy landscape. More specifically, when considering a magnet with one slanted edge, the highest energy state does not occur when a device is magnetized along its (geometrically) hard axis. Rather, if biased along its geometrically hard axis, a device is already on one side of the potential barrier and should always relax such that the sign of the  $y$ -component of magnetization is always the same (i.e. the same binary state) assuming no significant fabrication variation or thermal noise. This is captured qualitatively in figure 8(b) where, if the magnet with a slanted edge is magnetized along a geometrically hard axis, it should relax such that its  $y$ -component of magnetization is negative<sup>11</sup>.

When considering these slant-edged magnets, the ultimate sign of a device’s  $y$ -component of magnetization (and its binary/low energy state) is a function of both the position of its slanted edge, as well as which direction it is biased along its geometrically hard axis. For example, if the sign of the hard axis magnetic field applied to the magnet pictured in figure 8(b) were reversed, or if the slope of the slant were positive instead of negative, the preferred sign of the  $y$ -component of magnetization for the magnet with a slanted edge would change from negative to positive. Work presented in [38] describes a proof of concept experiment where arrays of individual magnets with slanted edges were designed and fabricated.

<sup>11</sup> Note that the energy barriers pictured in figures 8(a) and (b) are artistically created to demonstrate the general effect that a slanted edge has on a magnet’s energy barrier. In reality the slanted magnet’s energy barrier will appear slightly asymmetric as suggested in [20].



**Figure 9.** (a) Unavoidable crossing; (b) logical crossing based on XOR gates; (c) planar implementation of XOR based on NAND gates; (d) schematic of NML shape-based crossing.

**4.2.2. Shape-based gate design.** We can exploit the aforementioned switching behavior/energy barrier shift to realize non-majority-gate-based Boolean logic in clocked NML systems. As explained in section 2.3, the traditional majority gate design implements AND/OR logic by fixing one of the inputs to an up or down state. The fixed input magnet provides an energy barrier shift to the center (or ‘compute’) magnet, thus providing a preferred low energy state. In shape-based gates, using a magnet with a slanted edge for the compute magnet produces this same effect. AND or OR gate functionality is provided via the position of the slanted edge and the direction of the external clocking field, rather than the state of a fixed neighbor magnet. (This could even allow for the gate’s functionality to be toggled in circuits where bi-directional clocks can be leveraged [44].)

A schematic of a shape-based magnetic logic gate appears in figure 8(c). The center (compute) magnet has a slanted edge in its lower-left corner. When a clock field ( $H_{\text{clock}}$ ) is applied from right to left (as pictured in figure 8(c)) the preferred sign of the compute magnet’s  $y$ -component of magnetization is positive. If there is no net  $y$ -bias on the compute magnet from the inputs (i.e. the signs of the respective  $y$ -components of magnetization are opposite) or both inputs have a positive  $y$ -component of magnetization, the compute magnet should relax such that its  $y$ -component of magnetization is also positive. Alternatively, a negative bias of sufficient magnitude (i.e. if both inputs have a negative  $y$ -component of magnetization as illustrated in figure 8(c)) can magnetize the compute magnet against the state suggested by the slant and the direction of the applied clock field. A logic AND ensues if we equate  $-M_y$  to logic ‘0’, and  $+M_y$  to logic ‘1’. If the direction of  $H_{\text{clock}}$  were reversed, the gate would function as an OR gate.

**4.2.3. Experimental demonstration.** SEM and MFM images of successfully fabricated gate structures are illustrated in figure 8(d). Here, input magnets are not set by drivers with easy axes parallel to the direction of the applied external (clock) field (as was the case for the majority gate experiments discussed in section 2.3). Rather, the state of the inputs to the compute magnet is determined by leveraging a horizontally applied, external field to input magnets with appropriately defined slanted edges (see SEM images of AND and OR gates in figures 8(d)-(i) and (ii) respectively). Both samples were then magnetized from left-to-right with a 300 mT external field. MFM images of the final magnetization state of each gate appear in figure 8(d)-(iii) (AND gates) and figure 8(d)-(iv) (OR gates). Again, in all cases, the slanted

edge compute magnet relaxes to a state that is the logical AND/OR of its inputs.

OOMMF simulations (at 300 K) of similar structures demonstrated very strong correlation to the experimental results. In addition, the simulation results suggest that this gate design should function correctly even with almost 2° of clock field misalignment. Subsequent experimental testing (seen in [16]) has further demonstrated that slant-gate structures are compatible with proposed NML clock implementations (see section 6).

Finally, room temperature simulations have shown the viability of more complex shape-based gate structures, thus providing significantly reduced footprints for several logic functions [44]. This is possible, as only two inputs will need to be routed to a compute magnet instead of 3. Moreover, even if larger devices are used (i.e. to more easily allow for the definition of the slanted edge), the net footprint of a two-input gate can still be less than the net footprint of a three-input gate made from smaller, symmetric devices [38].

#### 4.3. Local interconnect

In CMOS systems, the ability to route signals in multiple layers of metal is essential for creating local interconnections between individual logic gates and/or functional units. For example, consider the simple graph shown in figure 9(a)—where nodes A and B might represent two inputs that are both needed by two logic gates (C and D). The required connections illustrated in figure 9(a) could be easily implemented in a CMOS circuit using two layers of metal (i.e. metal 1 might be used for the connections from A-to-C, A-to-D, and B-to-D, and metal 2 for the connection from B-to-C).

Similar functionality will also be needed as more complex NML circuits are designed and fabricated. However, existing work [45] has suggested that, for QCA-like architectures (that require nearest neighbor interactions) while theoretically possible, wire crossings like those illustrated in figure 9(a) can be more difficult to physically implement than in a CMOS equivalent. Given the experimental state of the art for most electrostatic implementations of QCA, any such crossing would seemingly need to be done in-plane. While designs for such a structure have been proposed [28], fabrication is by no means trivial. For example, considering a proposed molecular implementation [46, 47], the wire crossing structure proposed in [28] would require one molecule ( $\sim 1 \text{ nm} \times 1 \text{ nm}$ ) to be placed immediately next to another molecule ( $\sim 1 \text{ nm} \times 1 \text{ nm}$ ) that is rotated by 45°. To circumvent this potentially significant design constraint, [45]



**Figure 10.** Micromagnetic simulations of a shape-based crossing. Inputs drivers switch from down–up to up–down, which requires every island in the structure ( $i_1$ ,  $i_2$ ,  $c$ ,  $o_1$ , and  $o_2$ ) to change state. The time evolution of the magnet ensemble (assuming a 10 mT clock field applied from left-to-right) is illustrated in panels (i) through (vii). The magnetic islands were assumed to be permalloy ( $M_s = 800\,000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ).

proposed using selective logic block duplication and planar, logical crossings (see figure 9(b)) to facilitate a local connection like that illustrated in figure 9(c). While similar ideas could also be applied to NML circuits, other approaches are also possible and are discussed below.

**4.3.1. In-plane crossovers with magnetic structures.** As described in [48], magnet shape can be used such that a single magnetic island can represent two bits of information simultaneously. This idea can be employed to realize a co-planar wire crossing. A schematic of what such a ‘shape-based’ wire crossing might look like appears in figure 9(d)—where the middle nanomagnet has a component of both inputs (at left) and transfers the polarization of both inputs to both outputs (at right). As seen from the schematic, the inputs to the structure are effectively ‘crossed’.

Work in [48] noted that a structure like that illustrated in figure 9(d) was tested with all possible input combinations assuming all possible initial magnetization states. (Thus, 16 simulations were performed in all.) For this paper, simulations of similar designs (given all possible input combinations and initial magnetization states) were repeated (as the simulations in [48] assumed a relatively high damping coefficient of 0.5 and assumed a temperature of 0 K). Again, in all cases, the magnetization states associated with the inputs were successfully transferred to the proper outputs, and signals were crossed.

To better illustrate the switching process, the results of one simulation (assuming permalloy dots— $M_s = 800\,000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , and an exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ) appear in figure 10. Here, the initial state of the structure (see figure 10-(i)) initially reflects one input with a negative  $y$ -component of magnetization ( $i_1$ ) and one input with a positive  $y$ -component of magnetization ( $i_2$ ). When the simulation begins, the  $y$ -component of each input changes sign (i.e. a logic ‘1’ becomes a logic ‘0’ and vice versa) and a clocking field is applied from left-to-right (i.e. the sign of  $H_{\text{clock}}$  is positive). As the simulation progresses, input island  $i_1$  changes in response to its driver (figure 10-(ii))

through figure 10-(iv)). This in turn induces a rotation of the center island ( $c$ ) and output  $o_2$  (figure 10(v)). Finally,  $i_2$  switches—which in turn induces an additional change in the magnetization state of  $c$  and  $o_1$  (figures 10-(vi) and (vii)). Note that magnets  $c$ ,  $i_1$ , and  $i_2$  switch to a logically correct magnetization state—even if the sign of that desired state is opposite to the sign of the applied field<sup>12</sup>.

Alternatively, a co-planar crossover with ferromagnetically ordered lines has been proposed by Pulecio and Bhanja in [49], wherein an external magnetic field is applied to a structure like that shown (schematically) in figure 11(a). After the field was removed, the state of the magnets was observed with MFM, and correct ferromagnetic ordering in both horizontal and vertical lines was observed. While these results might suggest that the aforementioned arrangement of magnets would be a promising structure to cross signals in the plane, we believe that this structure is not extensible to chip-level clocking schemes—as the state of one of the inputs must be encoded by the sign of the external magnetic field. More specifically, if an external field is applied to a structure like that illustrated in figure 11(a), it will hard-axis-bias the devices in the vertical ferromagnetically ordered line. Presumably, a fixed driver could then set the state of one magnet at the end of this line. However, the easy axes of the devices in the horizontal ferromagnetically ordered line are parallel to the direction of the applied field (and presumably will always be magnetized according to the sign of the applied field if it is of sufficient magnitude).

With this context, it will be impossible to re-evaluate the horizontal line in the crossover structure given an applied clocking field and fringing fields from another magnet. If the horizontal line is magnetized such that  $M_x$  is positive, and  $H_{\text{clock}}$  is positive, the clocking field will only reinforce the

<sup>12</sup>In these simulations, the input magnets, output magnets, and middle magnet have all been sized differently. While ultimately this may not be required, sizing helps to ensure that the inputs will have more influence over the device that represents both states (magnet ‘c’), which will in turn have more influence over the output magnets. Also, while not reported here, simulation results indicate that the output magnets can in fact ‘tip’ a hard-axis-biased output magnet into a new, logically correct state.



**Figure 11.** (a) Schematic of the co-planar crossing structure proposed in [49]; (b) information must be encoded as the sign of the magnetic field to re-evaluate a horizontal line with a new input.



**Figure 12.** (a) Possible coupling mechanisms for electrostatic QCA devices; (b) cross-sectional schematic of hypothetical magnet layout where a signal in one plane (from ‘1’) could be routed to a device in the same plane (to ‘4’) and another plane (to ‘8’); (c) micromagnetic simulations of the layout in (b) suggest that the time evolution of this system is correct; (d) Maxwell finite element simulations of a clad line with a  $500 \times 500 \text{ nm}^2$  cross section suggest that fields produced by the wire (with a 5 mA current) could extend up to 200 nm—which would be suitable for controlling the structure in (b).

state of the remaining devices in the line (even if the sign of the  $x$ -component of magnetization of the magnet at the input is changed). Presumably, the only way to change the magnetization state of the line where the easy axes of the devices in the line are parallel to the applied clocking field, is to change the sign of the applied field (e.g. such that  $H_{\text{clock}}$  is negative as is seen in figures 11(b)).

**4.3.2. Using a third dimension for interconnect.** For any QCA implementation, in principle, data could also be routed in a third dimension—as devices could theoretically couple out-of-plane just as easily as in-plane as shown in figure 12(a). However, from the standpoint of fabrication, devices that exhibit vertical coupling would be difficult at best to achieve for any electrostatic implementation of QCA. Using molecular QCA as an example, devices must ultimately be

attached to a given substrate, and to place devices in another layer, another substrate would be needed. This would certainly eliminate the possibility of neighbor-to-neighbor coupling.

However, given a magnetic implementation of the QCA device architecture (i.e. NML), routing data in a third dimension may be more feasible. One could conceivably place magnetic islands in different planes—where an oxide separates the planes. As the relative permeability of silicon dioxide is essentially the same as that of a vacuum, devices in multiple planes could still couple with each other.

As a proof of concept, we considered the NML layout shown (in cross section) in figure 12(b) via micromagnetic simulation (at 0 K). The goals of this simulation were to (i) show that the magnetization state of device 1 could properly influence the magnetization state of device 4 and device 8, and (ii) that the time evolution of the system (essentially a fanout into a third dimension) was correct. All



**Figure 13.** The soliton-based switching process used in [43]: (a) devices in an ensemble are all hard-axis-biased (i.e. by a magnetic field); (b) the field is removed, the input to the ensemble is set, and devices begin switching; (c) eventually, all devices switch to a logically correct ground state associated with the input; (d) the state illustrated in (a) is somewhat unstable, and premature device switching may occur—resulting in erroneous behavior.

devices begin hard-axis-biased. A magnetic clocking field was applied to this system that slowly decreased in magnitude. Note that helper islands were employed to promote hard axis stability at the end of AF-ordered lines (as advocated in section 2.4), as well as the devices that comprised the vertical AF-ordered line. The time evolution of the system is shown in figure 12(c) and appears correct. Moreover, Maxwell simulations of clad lines suggest that the field magnitude more than 200 nm above the surface of a conducting clock line is representative of field magnitudes typically used to re-evaluate an NML ensemble in simulation.

We anticipate that a practical process for multilayer NML (M-NML) can be achieved based on existing processes used for shallow trench isolation (STI). Since STI uses a chemical mechanical polishing (CMP) process for removing excess chemical vapor deposition (CVD) deposited oxide, a large body of knowledge has already been developed around this technique [50]. An M-NML process would begin by a first layer of nanomagnets being deposited as discussed above. A thin nitride layer would be deposited as a sacrificial etch stop, as is commonly done in STI processing, although in this case it would be only a few nm. Following this, either a conventional oxide CVD layer would be deposited, or an enhanced permeability dielectric (EPD) [36] that has nanometer-sized magnetic grains embedded in it might also be used. CMP of the excess material would be performed, but in the case of the EPD, a new process would have to be developed. Stopping on the nitride layer over the nanomagnets should be straightforward, leaving a planar surface on which to repeat this process as many times as is feasible given the magnetic field pattern of the underlying clocking layer (see section 6).

We believe that this approach to signal routing should be pursued—as it could significantly simplify local interconnect (and the ultimate complexity) of NML circuits—much like the introduction of an additional metal layer in CMOS circuits.

**4.3.3. Electrical interconnect.** As will be seen in sections 7 and 8, it should also be possible to transduce a signal from the magnetic domain to the electrical domain (and back) by use of a layered structure (e.g. a spin valve, tunnel junction, etc) commonly employed in MRAM arrays. With such an interface, signals could then be routed as in conventional, CMOS circuits. While the overhead of such a conversion would almost certainly be prohibitive for local interconnect, such an interface may be useful to move data signals

over longer distances. This would avoid an automata-like interconnect and the time overhead of  $N$  magnet switching events.

## 5. Defects, errors, and robust operation

In this section, we consider how NML ensembles might behave (i.e. whether they reliably switch to a new, logically correct state) given a new input, the effects of thermal noise, clock field misalignment, lithographic variations in individual islands, and combinations thereof. This section concludes with a brief discussion on scalability. Before beginning this discussion, we stress that whether or not a circuit ultimately exhibits reliable and deterministic switching is very much a function of how it is clocked—in addition to material parameters, device shape, etc. In all of the subsections below, the underlying clocking assumptions will be explicitly noted.

### 5.1. Thermal noise

**5.1.1. Soliton operation.** One of the first efforts to consider reliable switching of NML ensembles in the presence of thermal noise was presented in [43]. In [43], the following procedure is used to clock/re-evaluate an NML ensemble.

- The ensemble is ‘initialized’ such that all devices are biased along their hard axes (see figure 13(a)—e.g. by a global magnetic field. Per the discussion in section 2, the ensemble is initialized to this state to ensure that a given device will see only one strong ‘driver.’)
- The external field is removed, and each device in the ensemble is expected to remain in a hard-axis-biased state until set by an appropriate neighbor. (While this is an unstable state of the system, dipole fields from neighboring magnets do help to preserve it [43].)
- All inputs to the ensemble are set, and easy-axis-directed fringing fields from said inputs cause neighboring devices to switch. Fringing fields from these devices then set the state of the next neighbor, etc as shown in figures 13(b) and (c).

As suggested by [43], the switching process is analogous to toppling a line of dominoes.

In the simulation efforts presented in [43],<sup>13</sup> the authors noted that dipole-to-dipole coupling could be insufficient

<sup>13</sup> As in the work discussed above, the OOMMF simulator was also used in the work discussed in [25]; similarly, the ThetaEvolve plugin was used to study the effects of thermal noise on ensemble switching.



**Figure 14.** (a) How biaxial anisotropy affects the energy landscape of a  $60 \times 90 \times 5 \text{ nm}^3$  cobalt device (per [43],  $M_s = 1000\,000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy  $= 1.3 \times 10^{-11} \text{ J m}^{-1}$ ). As the value of  $K_1$  increases, there is a local energy minimum at  $0^\circ$  (which represents the point at which a magnet is biased along its hard axis). In a soliton mode of operation, this minimum could make hard-axis-biased devices in the ensemble more resilient to premature switching that is induced by thermal noise. Increasing the value of  $K_1$  increases the depth of the ‘well’ at the  $0^\circ$  point. This plot was generated with micromagnetic OOMMF simulations; (b) schematic of majority gate design assumed in [51]. Structures defined by dotted lines were not considered in the designs simulated by [51], but may lead to more reliable operation.

to keep an antiferromagnetically ordered line (for example) hard-axis-biased, and that thermal noise could induce premature, random, and unwanted switching. As shown in figure 13(d), the net effect would be similar to having another domino in the line fall before its neighbor tips it. Similar to the situation shown in figure 4(b), while the final magnetization state of the line may ultimately appear correct, the time evolution of this system is non-deterministic.

To combat this problem, [43] suggested fabricating magnetic islands with a magnetocrystalline biaxial anisotropy—i.e. such that  $U(\theta)$  becomes  $K_u \cos^2(q) + 1/4K_1 \sin^2(2q)$ , where  $K_1$  is the biaxial anisotropy constant. The biaxial anisotropy term introduces a local minimum in the energy landscape of an individual magnetic island at  $0^\circ$  (see figure 14(a))—which should further promote the hard axis stability of the ensemble<sup>14</sup>. Given the effects of thermal noise at  $T = 300 \text{ K}$ , [43] reports that a line of hard-axis-biased devices with a biaxial term remained in the intermediate state shown in figure 13(a) (until a device was set by an appropriate neighbor), when a device without the biaxial term did not.

In [51], Spedalieri, *et al* extended the work of [43] and considered the switching behavior of a suite of majority gates with biaxial anisotropy (see figure 14(b) for schematic). The clocking scheme assumed in this work is like the approach used in [43] and summarized by figure 13. Notably, assuming gates comprised of  $30 \text{ nm} \times 15 \text{ nm} \times 6 \text{ nm}$  magnets, the gate error rate can exceed 15%. The most common errors observed were (i) premature switching in randomized islands due to thermal noise and (ii) gates where devices essentially remained hard-axis-biased (due to a biaxial constant/biaxial anisotropy that was too large).

As suggested in [51], one proposed solution to these presumably high error rates was to increase the size of the

magnetic islands. This simultaneously leads to (a) an increase in the depth of the local minimum (in the energy landscape associated with a device with biaxial anisotropy—figures 14(a)), and (b) devices that can generate a stronger local field on a neighbor. The deeper minimum improves hard axis stability in the presence of thermal noise, while stronger interactions between adjacent devices helps to prevent an ensemble from remaining hard-axis-biased. Reference [51] suggests that, assuming magnetic islands with a  $2 \times 1$  aspect ratio (height-to-width), devices would need to be at least  $200 \text{ nm}$  long such that unwanted switching is reduced to an acceptable level.

One design technique that was not considered in [51]—and that has been advocated by [43, 48]—is the use of ‘stabilizer’/‘helper’ islands to control switching in (i) ferromagnetically ordered interconnect (which is present in the gate design considered in [51]) or (ii) the last magnet in the gate per the discussion in section 2 (see structures defined by dotted lines in schematic in figure 14(b)). Helper/stabilizer cells can be used to compensate for the lack of dipole coupling (i.e. that would be present in a hard-axis-biased AF-ordered line) in both ferromagnetically ordered interconnect and the last magnet in an inverter chain or gate. Because their easy axes would be parallel to the direction of the applied clocking field, their magnetization state should never change. Thus, while we do not consider the results presented in [51] to be unreasonable, this (and other design levers to be discussed) should also be considered in future study. Moreover, it is not unlikely that ‘unmanaged’ boundary conditions and ‘unmanaged’ ferromagnetically ordered interconnect are at least in part responsible for the error rates for the majority gate experiments reported in [1, 52].

To conclude this subsection, the mode of operation (i.e. how the devices are clocked, and how data propagates through an ensemble once the inputs are set) is often referred to as soliton mode. In other words, a magnetic soliton propagates through a chain of devices. An important characteristic of soliton mode is that a device in a clocked

<sup>14</sup>As noted in [35], ideally, the energy minimum must be tuned such that it is high enough to prevent premature relaxation, and low enough so that a neighboring device can still bias a device to a new, and logically correct state (i.e. magnet fringing fields must be capable of transitioning a magnet out of the local minimum).



**Figure 15.** (a) A hard-axis-biased magnet (with no mechanism to keep said device hard-axis-biased) relaxes randomly (here, devices are assumed to be permalloy— $M_s = 800\,000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , exchange energy =  $1.3 \times 10^{-11} \text{ J m}^{-1}$ ); (b) if a magnetic field is applied along the hard axis of the device, it remains magnetized at  $0^\circ$ ; when a nominal (easy-axis-directed) biasing field is applied to the device (in conjunction with the hard axis field), the magnetization state of the device (consistently changes) but the device does not completely relax along its easy axis; when both fields are removed, the devices completely relax along its easy axis such that the sign of  $M_y$  is the same as the sign of  $H_{y-\text{bias}}$ .

ensemble remains hard-axis-biased until it is ‘knocked out’ of the  $0^\circ$  point by fringing fields from an appropriate neighbor. Soliton propagation is actually illustrated graphically in figure 7(a). Devices are hard-axis-biased by an external field and remain so until a neighbor changes state. This is in contrast to the more adiabatic switching mode to be considered next.

**5.1.2. Adiabatic operation.** Both [43] and [51] suggest that thermal fluctuations can cause a hard-axis-biased magnet (i.e. at the top of the energy landscape) to randomly relax along its easy axis before fringing fields from a neighboring device induce deterministic switching into what is presumably a logically correct state.

This is readily evident from the OOMMF simulations captured by figure 15(a), where a  $60 \times 90 \times 30 \text{ nm}^3$  permalloy magnet is initialized such that  $M_x(\text{hard axis}) = M_s$  (saturation magnetization). The device was subjected to just the effects of a Langevin, stochastically fluctuating, temperature-dependent field (the ThetaEvolve plugin to OOMMF was used here as well) for 3 ns. After 3 ns, a nominal biasing field was applied to the device (10 mT in the positive  $y$ -direction) in an attempt to make the device relax such that  $M_y$  was positive. As seen from figure 15(a), relaxation is completely random. Summarizing the results of 50 simulations, 50% of the time, the device relaxes such that  $M_y$  is negative, 40% of the time a device relaxes such that  $M_y$  is positive, and 10% of the time it enters a vortex state (which would likely be eliminated if a higher aspect ratio device were considered). In nearly all cases, the device relaxes along its easy axis well before the biasing field is applied.

Alternatively, consider the simulation results captured by figure 15(b). Here, the same magnet is also initialized such that  $M_x = M_s$ . However, for the first 3 ns of the simulation, a hard-axis-directed field (the magnitude of which is representative of the field generated by two, hard-axis-biased devices on a device sandwiched between them plus a nominal clock field) was applied to preserve the energy landscape illustrated in figure 3(a)-(i). As in the

simulations described by figure 15(a), after 3 ns (where in all cases, the device has remained hard-axis-biased), a nominal  $H_y$  biasing field is applied in an attempt to deterministically influence the sign of the  $y$ -component of magnetization after the device relaxes and all fields are removed. Here, in all simulations,  $M_y$  is positive when the simulation finishes.

The switching behavior illustrated in figure 15(b) is fundamental to the line and gate simulation results to be discussed next. More specifically, after a  $y$ -bias was applied to the single device—but before the hard-axis-directed field was removed, the magnetization state of the device was such that it could produce a biasing field on a neighboring device (which could then presumably influence the state of another device, etc). Moreover, the hard-axis-directed field could suppress undesirable, premature switching described in [43, 51].

There are two obvious consequences to this type of switching:

- First, each successive device in a clocked ensemble would presumably become a weaker driver of its neighbor (due to the presence of the hard axis field). However, this problem could be overcome by only clocking groups of only  $N$  magnets at a time—where  $N$  is defined such that magnet  $N - 1$  in the critical path can generate a biasing field of sufficient magnitude to switch magnet  $N$ . If the magnetic field is then removed from this  $N$ -magnet group, devices should completely relax along their easy axes (see figure 15(b)) and magnet  $N$  could serve as a strong driver to another subgroup.
- Second, any additional overhead in terms of clock energy must be accounted for in the ‘cost’ of a computation. For example, if the hard axis field is generated by a current-carrying wire, calculations of Ohmic loss must be based on the entire time for which the wire is excited.

In section 6, we will discuss CMOS compatible clock structures that could allow for the local control of  $N$ -magnet subgroups that can also be used to promote unidirectional dataflow. Energy overhead will be considered at the end of this subsection. However, we first discuss simulation results



**Figure 16.** (a) Snapshots of different stages of a micromagnetic simulation that illustrate the desired outcome of a simulation of an AF-ordered line. After the input to the line is changed (panel 1) a magnetic clocking field is applied along the hard axes of the devices in the ensemble, and devices are biased toward a new, logically correct state (panel 2). As the clocking field is removed, devices relax along their easy axes (panel 3); (b) how the average magnetization state of magnets A–G changes as a function of time. Note that it would appear that magnet G (the last device in the line) is biased into its final (and logically correct) state before the neighbor that is supposed to ‘drive’ it (magnet F). However, Magnet G is still essentially hard-axis-biased. We believe that the minor fluctuations in \$M\_x\$ for magnet G (i.e. the minor \$\pm\$ sign changes) stem from (i) thermal noise and (ii) coupling to the helper island that mimics a hard-axis-biased longer line (a domain wall can begin to form in the block). (c) when the same simulation is performed that produced the graph shown in (b)—but the temperature is lowered to 0 K and a local, easy-axis-directed field is applied to the terminating helper island such that a domain wall cannot form, this noise is eliminated, and there is no ambiguity in the time evolution of the line; (d) if the same line considered in (b) is initialized to the opposite magnetization state (when compared to (a)) and re-evaluated, the time evolution and final state of the line is also correct; (e) adiabatic switching of an AF-ordered line can be accomplished on faster time scales.

of lines and gates that suggest that this approach to clocking an NML ensemble could lead to more reliable switching.

Figure 16(a) illustrates snapshots of different stages of an OOMMF simulation that are representative of our simulation efforts with AF-ordered lines. We begin with an AF-ordered line in a logically correct state (i.e. successive magnets have \$y\$-components of magnetization of opposite sign). The input to the line is changed (panel 1), and a magnetic clocking field is applied along the hard axes of the devices in the ensemble. Despite the presence of the applied field, all magnets are biased into a new, logically correct state before the field is removed (panel 2). As the clock field is removed, devices relax along their easy axes (panel 3).

Note that for this simulation (and others to be discussed below) devices were assumed to be permalloy (\$M\_s = 800\,000\, \text{A m}^{-1}\$, \$\alpha = 0.1\$, exchange energy \$= 1.3 \times 10^{-11}\, \text{J m}^{-1}\$), had a footprint of \$60 \times 90 \times 30\, \text{nm}^3\$, and the line was terminated by a block of magnetic material to mimic an adjacent group of devices that was also hard-axis-biased

(to ensure unidirectional dataflow). The clock field was then ramped from 0 to 10 mT over the course of 5 ns, and was removed in a similar fashion<sup>15</sup>.

Of particular interest is how the magnetization state of each device in the line changes in time. We would like to see the sign of the \$y\$-component of magnetization for magnet A change first, followed by magnet B, etc. Representative results from one simulation are shown in figure 16(b). Notably, it would appear that magnet G (the last device in the line) is biased to its final (and logically correct) state before the neighbor device that is supposed to drive it (magnet F). That said, minor fluctuations in \$M\_x\$ for magnet G (i.e. the minor \$\pm\$ sign changes) could be attributed to (i) thermal noise (as seen in figure 15(a)) or (ii) some minor oscillation

<sup>15</sup>The magnitude of the clock field is a function of many design parameters per the discussion in section 4; the size, shape, material, etc parameters used in the simulations discussed here were chosen as (a) they are generally representative of our current fabrication targets, and (b) provide a consistent basis for comparison.



**Figure 17.** (a) Snapshots of different stages of a micromagnetic simulation that illustrate the desired outcome of a simulation of a shape-based OR gate. After both inputs change state (panel 1) a magnetic clocking field is applied along the hard axes of the devices in the ensemble, and devices are biased toward a new, logically correct state (panel 2). As the clocking field is removed, devices relax along their easy axes (panel 3). Note that here, fringing fields from the two input magnets must bias the compute magnet such that its final  $y$ -component of magnetization is opposite of that suggested by the direction of the clock and position of the slant; (b) the time evolution of this system is correct. Note that the devices of one input arm change state slightly before the devices in the other input arm—most likely due to unwanted coupling between the top input and the left-most helper island (which presumably could be eliminated—but helper cell placement will have to be carefully considered in more complex circuit designs); (c) adiabatic switching of an AF-ordered line can be accomplished on faster time scales; (d) snapshots of different states of a micromagnetic simulation that illustrate the desired time outcome of a shape-based OR gate given a different initial set of inputs; (e) the time evolution of the devices given the initial conditions of (d). While the devices in one input arm change state as needed, a state change at the output is not desired—this behavior is observed.

around the  $0^\circ$  point when the device is subjected to just a hard-axis-directed field and no  $y$ -bias) and (ii) coupling to the helper island that mimics a hard-axis-biased longer line (a domain wall can begin to form in the block). In fact, when the same simulation is performed that produced the graph shown

in figure 15(b)—but the temperature is lowered to 0 K and a local, easy-axis-directed field is applied to the terminating helper island such that a domain wall cannot form—this noise is eliminated, and there is no ambiguity in the time evolution of the line (see figure 16(c)). Additionally, if the magnets



**Figure 18.** Graphical illustration of three-phase clock: one group of magnets is unclocked and drives another group of magnets. The group of magnets adjacent to the group of magnets that is switching/being driven is hard-axis-biased to promote unidirectional dataflow. An inherent pipeline can be created as multiple bits of information could move through the ensemble simultaneously.

in the same line are initialized such that their state is the opposite of that shown in figure 16(a) (i.e. the other input combination), the time evolution and final state of the line is also correct post re-evaluation (see figure 16(d)). Finally, as will be seen later, in some instances, the  $y$ -component of magnetization can oscillate around the  $0^\circ$  point such that the device's  $y$ -component of magnetization trends toward the wrong sign—but ultimately is biased by a neighbor into a final, and logically correct, state.

As other simulations (with different random seeds) produced similar results, we believe that line switching given the above clocking process can be made to be reliable. Moreover, from the standpoint of performance, faster field ramp times appear to be quite tolerable. As seen from figure 16(e), all magnets in the same line considered above are biased into a new, and logically correct state, after just 2 ns.

Similar results are seen from simulations of shape-based gates. As before, figure 17(a) illustrates snapshots of different stages of an OOMMF simulation of a shape-based OR gate. Note that while all gate input combinations were considered via micromagnetic simulation, we discuss a representative example that corresponds to a case where both inputs transition to a logic ‘0’ and the devices that comprise the gate are initially in a state associated with the case where both inputs were logic ‘1’s. In this case, the inputs to the compute magnet must bias the compute magnet against the slant—which previous work [44] suggests can be the most difficult when considering ensemble re-evaluation.

As with AF-ordered lines, we show the time evolution of the magnets from a representative simulation given the input combination discussed above. As seen in figure 17(b), the compute magnet changes state only after both inputs change state; inputs do not have to arrive at the compute magnet simultaneously. Also, as with lines, the time scales associated with the application of a magnetic field and magnet switching can be accelerated. As seen in figure 17(c), all devices in the

gate considered have been biased into a logically correct state after just 1.5 ns. Finally, for completeness, we illustrate the results of a micromagnetic simulation where an input to a gate changes state—but where this change should not induce a change at the output (figure 17(d)). As expected, the time evolution of the input line is correct, and other devices do not change state (figure 17(e)).

**5.1.3. Discussion.** We now consider the above switching modes in more detail at the device level. More quantitatively, consider figure 18(a), which illustrates:

- The energy landscape of a  $60 \times 90 \times 5 \text{ nm}^3$  cobalt magnet with biaxial anisotropy when subjected to a 28 mT, hard axis/ $x$ -directed field. (28 mT is the average field a magnet would experience from two hard-axis-biased neighbors assuming the neighbors were 12 nm away. Per [43], assuming a global clock like that discussed in section 5.1.1, fringing fields from neighboring devices would be solely responsible for preserving hard axis stability until a device is ‘flipped’ by an appropriate neighbor.)
- The energy landscape of a  $60 \times 90 \times 5 \text{ nm}^3$  cobalt magnet without biaxial anisotropy (i.e.  $K_1 = 0$ ) when subjected to a 28 mT field (presumed to be from neighboring magnets as described above) as well as a clocking field (representative of the approach to clocking discussed in section 5.1.2). Note that magnet thickness and saturation magnetization were updated in order to make a more uniform comparison to the biaxial device discussed above in (i).

From the figure above, note that when the device with biaxial anisotropy is in a  $0^\circ$ /metastable state, only an  $\sim 8 \text{ kT}$  barrier separates this state from two other local energy minima—one of which represents a logically incorrect state. Alternatively, for the device without biaxial anisotropy (and that is also subjected to a nominal clock field), there is a



**Figure 19.** (a) Schematic illustrating the potential (negative) effects of field misalignment; (b) with  $\pm 2^\circ$  of field misalignment, the devices in the AF-ordered line ensemble illustrated in figure 16(a) still evolve to a final and logically correct state. This example also clearly illustrates how there can be minor variation in the  $y$ -component of magnetization for the last magnet in the line. Notably, as seen in the inset, for seed 1, magnet G is initially biased toward its final, logically correct state ( $+M_y$ ) before its neighbor (magnet F). However, the  $M_y$  of magnet G then becomes negative, magnet F then switches, and magnet G then begins to relax toward its final, logically correct state. As similar (but more subtle) effect is seen given seed 2; (c) simulations of the same AF-ordered line with  $\pm 1^\circ$  of misalignment always show a line switching to a logically correct state (the peak field magnitude was 10 mT); (d) gate simulations also appear to tolerate field misalignment. Here, the time evolution of the ensemble illustrated in figure 17(a) is considered when the applied field misalignment is  $\pm 0.25^\circ$  and  $\pm 0.5^\circ$ . While only one input combination is illustrated, per [44], this is perceived to be the case that is most susceptible to field misalignment.

single, absolute minimum. When subjected to a  $y$ -biasing field from a neighbor, an absolute minimum is retained.

While switching in the presence of an applied field could be a path to making more reliable NML circuits, it suggests an architectural advantage as well. Namely, by clocking subgroups of nanomagnets, we can create an inherent pipeline [53]. (See figure 18(b) for a schematic that illustrates how a three-phase clock might be used to control an AF-ordered line.) A recent review on spin-based logic devices [54] also considered pipelined NML circuits from the standpoint of how they were clocked. More specifically, Bandyopadhyay and Cahay consider tradeoffs between a pipelined ensemble, and an ensemble that was controlled with a global clock. When considering a pipelined ensemble, [54] assumes that every magnet would be clocked individually, two magnets would be simultaneously hard-axis-biased, an appropriate neighbor would drive one magnet, and this

would promote unidirectional dataflow. A similar scheme is considered in [55, 56]. Reference [54] notes that the obvious downside to this approach is that each device would need to be controlled individually—impractical if a  $1 \text{ cm}^2$  chip contains  $10^{10}$  devices as suggested in [57]. In contrast, [54] also considered the architectural impact of a global clock. Assuming the aforementioned packing density, the critical path through an ensemble could be on the order of  $\sqrt{2} \times 10^5$  devices. Assuming a 1 ns switching time per device, this would result in a clock frequency of just 7 kHz.

The above simulation results (and the clock implementations to be considered in section 6) suggest that design points between these two extremes are clearly possible. Given the switching times for the line and gate structures illustrated in figures 16(e) and 17(e), clock rates on the order of 100s of MHz could be possible—as a new result could be generated



**Figure 20.** (a) There is an energy barrier shift in magnets with variation; (b) edge roughness results in more fluctuations during relaxation—but in all cases, switching in the presence of an applied field leads to the desired final state.

by the ensemble every time the last group of magnets cycles through all  $N$  clock phases<sup>16</sup>.

Finally, we conclude this subsection by revisiting the issue of clock energy overhead for soliton and adiabatic operation. As context, consider the simulation results presented in figure 7(a) (a line of cobalt devices with biaxial anisotropy, which have been suggested for more reliable soliton operation), and figure 7(b) (a line of permalloy devices with no biaxial anisotropy that are commonly used in the adiabatic simulations discussed within). While soliton-based operation could presumably lead to lower energy overhead from the clock (the clock field could be turned off immediately after all devices are hard-axis-biased) for the line of devices whose behavior is captured by figure 7(a), an external field must still be applied for  $\sim 7$  ns. For the line of permalloy devices (figure 7(b)), the field could be turned off after  $\sim 9$  ns and all devices are already biased into a new, logically correct state<sup>17</sup>. Thus, the perceived potential performance advantages of soliton operation—which may in fact exist—may not be as significant as one might expect, especially given the potential for premature switching and race conditions [43]. Additional study is needed in future work.

### 5.2. Field misalignment

In [44], initial experimental efforts with shape-based logic gates were discussed. Notably, micromagnetic simulations presented in [44] suggested that clock field misalignment could either (a) lead to gate designs where operation appeared

<sup>16</sup>An inherent pipeline could presumably be created given soliton-based switching as well, but the problems described above would still need to be accounted for.

<sup>17</sup>For this specific example, a higher magnitude field is needed for the line of permalloy devices. However, (i) different material systems are presumed, and (ii) additional simulations of lines with different device geometries suggest that permalloy devices can be controlled with lower fields. Moreover, simulations of permalloy and cobalt lines at 300 K suggest that similar field magnitudes are needed for re-evaluation.

to be correct—but where a misaligned clock field was instead responsible for determining the final state of the compute magnet<sup>18</sup>, or (b) clock field misalignment that re-enforced the preferred y-component of magnetization suggested by the slanted edge of the compute magnet resulted in a mistake state. (The latter case is illustrated graphically in figure 19(a).)

That said, micromagnetic simulations, where NML ensembles are clocked in an adiabatic fashion as described above, suggest that both lines and gates can tolerate field misalignment. Figure 19(b) illustrates the results of two, AF-ordered lines where the clock field was misaligned by  $2^\circ$  with respect to the hard axis of the magnets. In both cases, the final state of the line is correct. (Additionally, as seen from the inset, the last magnet in the line can be biased into an incorrect state, and then be switched back.) Figure 19(c) illustrate results of  $\sim 50$  micromagnetic simulations<sup>19</sup> where an AF-ordered line was subjected to a clock field that was misaligned by  $\pm 1^\circ$  with respect to the hard axes of the devices in the line. In all instances, the line evolves to a logically correct state dictated by the new input. (Additional misalignment was not tested.) The gate design from figures 17(a)–(c) was also re-simulated and subjected to  $\pm 0.25^\circ$  and  $\pm 0.5^\circ$  of misalignment. (Again, additional misalignment was not tested.) In all cases, the time evolution of the devices in the design illustrated in figure 17(a) was still correct (see figure 19(d)).

While these results are obviously preliminary, and additional study is needed, the simulations discussed above do suggest that NML circuits could tolerate at least some field misalignment. Moreover, if subgroups of devices are controlled with clock structures like those to be discussed in section 6, chip-level field alignment issues—as suggested by [54]—should be much less significant.

<sup>18</sup>Clock field misalignment resulted in an additional bias along the compute magnet's easy axis; this bias, and not fringing fields from inputs, influenced the final state of the gate.

<sup>19</sup>A few simulations were inadvertently killed while running on our high-performance-computing cluster and were not restarted.



**Figure 21.** (a) Schematic of clad copper line with NML circuit components placed on top. By passing current through the line, a magnetic field will be generated that is parallel to the hard axes of the devices. Larger NML ensembles could be controlled by parallel, in-plane wires (b), or clock lines distributed in multiple planes (c).

### 5.3. Misshapenness

We conclude this section with a brief discussion of how lithographic variation (of the magnets themselves) could affect whether or not an NML ensemble can be correctly re-evaluated with a new input. As seen from figure 20(a), variation can cause the peak of a device's energy barrier to shift from  $0^\circ$ . If controllable, this shift can be exploited for the purposes of reduced footprint Boolean logic gates. However, if the variation is uncontrollable, if it occurs in undesirable locations (i.e. a single device in the middle of an AF-ordered line), etc it could adversely affect the switching behavior of an NML ensemble.

More specifically, as suggested in [8], variation can lead to 'stuck at' faults in an AF-ordered line. For example, referring to figure 20(a), assume that one magnet in an AF-ordered line was not well defined (e.g. it had a slanted edge) and its  $y$ -component of magnetization was initially positive. Even if a clock field is of sufficient magnitude to hard-axis-bias this device, it may still be incapable of inducing a state transition ( $M_y$  sign change) unless fringing fields from a neighboring device are sufficiently strong. If they are not, the device will relax back to its initial state, and a stuck at fault (or indeterminate state) could ensue [8].

Additionally, edge roughness could induce premature neighbor-to-neighbor coupling in a hard-axis-biased line. (Compare the oscillation in hard-axis-biased rounded-rectangle device with no variation in figure 15(b) to a hard-axis-biased rounded rectangle with variation in figure 20(b).) The effects of lithographic variation—especially in the face of scaling—will need careful study in future work.

### 5.4. Scalability

The energies associated with NML device operation are comparable to  $kT$  (i.e. 0.025 eV). As magnetostatic energies are proportional to the volume of nanomagnets, the energy associated with coupling and shape anisotropy are quickly diminished as magnet size decreases. This will impact the ultimate scalability of NML.

A magnet's internal energy barrier ( $\Delta U$ ) (typically from shape anisotropy) will impact how long data is retained in an unclocked, NML ensemble. A 40 kT barrier should equate to non-volatile storage for at least one year, while a 60 kT barrier implies retention times that are essentially infinite.

Shape anisotropy alone should allow for a 40 kT barrier in 20–30 nm sized dots made from permalloy. 'Size' refers to the longest axis of a 10 nm thick, 1:2 aspect ratio single-domain magnet (i.e. 10 nm wide  $\times$  20 nm tall  $\times$  10 nm thick). (The precise value is difficult to estimate, as non-uniform reversal modes and coupling with neighboring magnets may modify the energy landscape. 20 nm represents an optimistic estimate, while 30 nm is a more pessimistic estimate.) However, this size is not a fundamental limit to NML operation. For example, crystalline anisotropies in magnetic multilayers can result in much higher energy barriers than from shape anisotropy. In fact, NML-like devices have already been made from exactly these material systems [25]. In special exchange coupled systems, thermally stable nanomagnets as small as 4 nm were demonstrated [58] and there is no fundamental reason why NML devices could not be scaled close to this size range.

The coupling energy between nanomagnets ( $\Delta E$ ) represents a more fundamental limitation to NML scaling. Field coupling strength depends on the total magnetic moment of the particle ( $M_s V$ ). Saturation magnetizations ( $M_s$ ) for metallic ferromagnets range from 0.5 to 2.5 MA m $^{-1}$ . As such, reducing magnet size could drastically reduce coupling energy, and coupling energy can impact how reliable computation may be [59]. For example, if  $\Delta U$  is high, but  $\Delta E$  is low, devices may be non-volatile, but will also be more susceptible to settling into an error/metastable during ensemble re-evaluation. One possible approach to achieving logically correct operation in the face of scaling is to change the way an ensemble is clocked. For example, [59] has shown that given a slow clocking field, with high field gradient, error rates fall with  $\exp(-\Delta E/kT)$ . Thus, low error rates (e.g. 1/10 $^{-6}$  per computation) could be achievable for 10 nm size scale dots.

Mitigating undesirable effects of thermal noise will likely be an important design challenge as feature sizes are reduced below 50 nm—and new approaches to clocking may be needed. For example, to maintain error-free switching in the case of low potential barriers, one needs to create strong gradients in the clocking field and/or clock the magnets individually (with spin transfer torque or multiferroic materials).

To summarize, physics does not prohibit the scaling of NML to devices with feature sizes on the order of 10 nm—but advances in energy/materials will likely be needed for practical implementations. Finally, when considering scaling,



**Figure 22.** (a) Cross section of clad line; (b) images of lines with different AR magnets on top; (c) expected field distributions for different current pulses (Maxwell simulation results); (d) coercivities from OOMMF simulation results; (e) percentage of devices that switch assuming different currents/fields—devices are measured in middle and on ends of the line.

we conclude by noting that one must also consider what logic functionality can be accomplished in a given footprint—especially when compared to a state-of-the-art CMOS equivalent. For example, [60] predicts that a NAND FO1 in 15 nm CMOS will have a footprint of  $\sim 70\,000\text{ nm}^2$ . From [61] a majority logic gate constructed from magnets with  $60 \times 90\text{ nm}^2$  footprints will itself have a net footprint of  $\sim 68\,000\text{ nm}^2$ . Moreover, not only can a majority gate be programmed to perform a NAND function, it is inherently a more computationally powerful gate. Additionally, a shape-based AND gate made from slightly larger devices has a footprint of just  $\sim 34\,000\text{ nm}^2$ . Thus, while device scaling is obviously desirable, it may be possible to scale the size of a functional unit even if larger devices are used to make it.

## 6. Clocking

To date, most of the NML line and gate ensembles that have been fabricated have been clocked with an external

magnetic field (i.e. a sample was subjected to the magnetic field produced between two poles of an electromagnet). Not only does this introduce a natural source of error—it would be impossible to completely eliminate field misalignment for example—but it severely limits extensibility. For NML to be a viable candidate for digital systems, clocking must be done ‘on-chip’.

### 6.1. Line clocking: concept

Thus far, the most commonly employed clock is a magnetic field applied along the hard axes of an ensemble of magnets. This field places the devices into the metastable state (figure 3) required for re-evaluation. Copper wires clad with ferromagnetic material on the sides and bottom (like the word and bit lines used in field MRAM circuits [62, 63]) have been proposed in [34] as a way to generate a magnetic field for on-chip local control of NML circuits. A schematic of what such a system might look like appears in figure 21.





**Figure 23.** (a) Schematic of pillow shaped device considered in simulation; (b)  $BH$  curve generated with finite-element-based SpinFlow 3D simulator.

- Generally, a device near a clock wire boundary is less likely to switch (for a given current/field) than a device in the center of a clock line. This makes sense as the magnitude of the generated field is predicted to be higher near the center as suggested by figure 22(c) and [66]. (An exception is the middle group—466 mA, 97.2 mT—where in some instances, a higher percentage of devices near the boundary switch than in the middle. However, the number of observable devices in the middle was also fewer than the number of devices at the boundary. Percentages were plotted for brevity. For reference, absolute statistics are reported in tables 1 and 2).
- Additionally, given a lower current/field, devices with higher aspect ratios are less likely to switch than devices with lower aspect ratios—regardless of position on the wire.
- More quantitatively, if we examine the trends associated with a  $60 \times 130 \times 30 \text{ nm}^3$  device, one can see from figure 22(e) that given a 350 mA current/74.4 mT field (at center), only about 10% of all devices with this aspect ratio have switched. Alternatively, if a 455 mA pulse/97 mT field is generated, 100% of all examined devices exhibit a change in magnetization state. These results match well with our micromagnetic simulations—which predict a coercivity between  $\pm 94$  and 96 mT. Similarly, the switching statistics for  $60 \times 120 \times 30 \text{ nm}^3$  device are also well-correlated to simulation—i.e.  $\sim 90\%$  of all devices switch when subjected to a field of  $\sim 86$  mT. As seen in figure 22(d), simulations predict that a device with this geometry has a coercivity of  $\pm 83$ –86 mT.
- That said, there are also some minor anomalies when one tries to map the switching statistics captured by figure 22(e) to the simulations captured by figures 22(c) and (d). For example,  $\sim 90\%$  of higher aspect ratio devices switch given a current pulse/field of 498 mA/106 mT—where micromagnetic simulations suggest that slightly higher field should be required. However, here, simulation results only encompass 10 data points—and a wider coercivity ‘spread’ is possible (i.e. due to the relatively slow

field ramp times). Additionally, all OOMMF simulations assume that the top surface of a device is completely flat. Alternatively, SEM images of fabricated devices suggest that the top may be more ‘pillow’ shaped as shown in figure 23(a). This device geometry is particularly difficult to simulate with the finite difference-based OOMMF tool. (Simulations would be time consuming due to the extremely fine-grained mesh that would be needed.) However, in addition to OOMMF, we have recently begun using the finite-element-based Spinflow 3D software [67]. A hysteresis loop (assuming field ramp up/down times that are identical to those used in OOMMF simulations) suggest a coercivity of  $\pm 80$  mT—lower than the coercivity predicted by OOMMF for a device of the same footprint. More detailed considerations of shape should be considered in future work.

Finally, regarding the line and gate experiments discussed in [16], there is also good correlation between experimental and simulation results. Notably, to re-evaluate the AF-ordered lines that were tested in [16], it was reported that currents of 600–760 mA were required. (These correspond to a field magnitude of  $\sim 125$  mT.) Simulations suggest that fields of  $\sim 113$ –130 mT would be required for re-evaluation. Similarly, for the AND/OR gate experiments discussed in [16], it was reported that line currents of  $\sim 850$  mA (which corresponded to magnetic field magnitudes of 168–200 mT) were needed to re-evaluate a given gate with new inputs. Micromagnetic simulations that were representative of fabricated structures predicted that a field of at least 160–165 mT would be needed for re-evaluation.

Again, additional correlation between experiments and simulations are needed—especially as simulations are currently used to quantify clock energy and the ultimate performance over equivalent CMOS circuits.

### 6.3. Line clocking: extensibility

While the experiments discussed above seemingly suggest that prohibitively large currents/fields are required in order



**Figure 24.** Planar clock wires: (a) copper clock wires labeled A are placed in the dielectric using a damascene process with a NiFe liner. Copper clock wires labeled B are placed using a self-alignment process using clock wires labeled with an A for masking. A dielectric could first be placed in the Damascene cut to isolate the B wires from the A wires. The dielectric liner is followed by the NiFe liner and finally the copper. (b) The thin dielectric walls between the clock lines in (a) could result in a large capacitance. If the walls of the clock wires can be tapered, the capacitance can be lowered.



**Figure 25.** (a) Fields from two-level, three-phase clocking: the numbers in a given clock wire indicate the clock phase. An X indicates a current into the figure, and the arrowhead indicates a current out of the figure. Note that for each clock phase, the direction of the current alternates between the bottom and top clock lines. The bottom and the top clock lines can be placed in series—weaving the current between the top and the bottom for as many times as there is voltage for the clock driver. (b) Fields from two-level, two-phase clocking: note that each clock phase is assigned to either the upper clock level or the lower clock level—with the current in each level in opposite directions. Additional levels of metal are required to connect clock lines in series.

to re-evaluate an NML ensemble with new inputs, the high currents are an artifact of the experimental technique. The switching experiments with individual islands (discussed in [65] and expanded here) are based solely on easy axis switching. As such, high fields are necessary. Additionally, the line and gate experiments presented in [16] are also dependent on easy axis switching. Because electrical input structures have not yet been experimentally demonstrated, inputs to an ensemble (i.e. to an AF-ordered line) are set with driver magnets like that illustrated in figure 1(a). In order to consider how an ensemble responds to a new input, the driver must undergo easy axis switching. This necessitates the 600–760 mA currents and fields of  $\sim 125$  mT as noted above. Alternatively, simulations suggest that devices in an AF-ordered line are hard-axis-biased with lower magnitude currents/fields. Notably, assuming that an input to an AF-ordered line can be set by another mechanism, [11] suggests that lines of magnets can be controlled with magnetic fields of just 5 mT.

Other design levers exist that could be used to further lower switching currents. Of interest is the material that surrounds an NML ensemble (see figure 6). Most of our micromagnetic simulations have assumed that nanomagnets are surrounded by air. We could increase the ratio of flux density to magnetic field strength ( $\mu = B/H$ ) by surrounding magnets with a different material to increase absolute permeability. While we will need to ensure that the binary state of a magnet is not adversely affected, candidate materials do exist. Reference [36] has considered use of enhanced permeability dielectrics (EPDs) with embedded magnetic nanoparticles (e.g. CoFe particles 2–5 nm in diameter) to

increase the field from a word or bit line in field MRAM without increasing current. Materials could increase absolute permeability ( $\mu = \mu_0 \times \mu_r$ ) as relative permeability  $\mu_r$  could range between 2-to-30 ( $\mu_r$ s of 2–6 are most common). That particle sizes are below the superparamagnetic limit should help ensure that a magnet's state is not unduly influenced. Switching currents could be reduced by a factor of  $\mu_r$  and reduce Ohmic losses by  $\mu_r^2$ .

#### 6.4. Line clocking: clock wire layouts

Clock wiring both below and above the nanomagnets is being investigated. Planar wiring below the magnets was the first approach investigated because of its simplicity with respect to the nanomagnets. However, physically building wires with close enough spacing to null magnets in adjoining clock zones will be challenging. As such, we have begun to investigate dual level clocking.

Figure 24(a) shows the structure of a simple planar clock. Clock wires labeled 'A' could be placed first in a dielectric in the metal layers of a silicon based chip using a copper damascene process with a higher permeability material lining such as NiFe. The high permeability lining is used to focus the magnetic field generated by a current in the clock wire on the nanomagnets reducing the amount of current required in the clock [68, 69]. In order to get closely spaced intervening wires, a self-alignment process could be used to open cuts for the wires labeled B using the wires labeled A as masks for the second cut. A thin dielectric is first placed in the cut to isolate the second set of wires from the first set. This is followed by the high permeability lining and the copper.



**Figure 26.** Multi-level clock driver with multi-level NML: (a) two-level NML clocking can be expanded to multiple levels of NML by removing the top and bottom NiFe cladding from middle clock lines so that the fields that are generated affect NML magnets both above and below the clock lines. Weaved clock lines require an even number of clock levels and thus an odd number of NML layers for three-phase clocking. (b) Multi-level NML with a two-phase clock: by returning a two-phase clock on a second pair of clock lines, the return current is not wasted but used to drive a second level of NML. Note that the middle level of NML has reversed clocking for each phase.



**Figure 27.** (a) Clock driver— $R_{Clock}$  represents the resistance of the clock wire. The clock current is driven by transistor  $m0$  whose gate voltage is determined by  $V_{ClockRef}$ . In operation, during the non-pulse part of the clock cycle, capacitor  $cap$  is charged to  $V_{ClockRef}$  through transmission gate  $gate1$ . When the clock pulse is enabled by signal  $CkPulse$  the voltage on the capacitor is transferred to the gate of  $m0$  via transmission gate  $gate0$ . The gate of  $m0$  is discharged through transistors  $m1$  and  $m2$  with voltage  $V_{OffRef}$  controlling the voltage on the gate of  $m2$  and thus the resistance of the discharge path. (b) Three-Phase Clock Timing Parameters. Key parameters are the rise time,  $t_{RISE}$  and fall time,  $t_{FALL}$  for each clock phase and the on hold time,  $t_{Hhold}$  and off hold time,  $t_{Lhold}$  between each of the clock phase transitions. (c) Two-phase clock timing parameters. Two-phase clocks have similar parameter except that at most one clock is active at one time.

The structure shown in figure 24(a) could have a large capacitance between the clock wires due to the thin dielectric that would separate them. Figure 24(b) shows an alternative structure with the wires tapered to reduce the inter-clock capacitance. Tapering the etches for the clock wires may be difficult.

As shown in figure 24(b), the current in the planar clock wire needs to run in one direction so that nanomagnets are nulled in the same direction. If they are not nulled in the same direction, problems in nulling may occur at the borders between the clock lines. Also, if a clock driver drives more than one clock line, an out of plane return line would need to be used to return the current to the beginning of the clock wire. Thus, planar clock wires would require at least two levels of metal for the clock wiring.

To summarize, planar clock lines pose two main challenges:

- First, the needed close spacing of the lines leads to a large capacitance between the clock lines, which can affect the control of the current in the lines.
- Second, because the clock currents in a planar structure need to run in the same direction—such that hard-axis-directed clocking fields are applied in the same direction,

when using a single current, for the same phase, in different positions of the clock—a return line to the first side of the NML plane would be required which would waste a voltage drop in the return line. This would only lead to a net increase in benchmark energy.

That said, both challenges can be addressed by using a two-level clocking scheme with each phase alternating between the upper and lower clock levels. Figure 25(a) shows the relative position of clock lines above and below the NML magnets along with the resulting magnetic fields from the indicated current directions. Note that the currents in the upper clock wires are reversed from the direction of the current in the lower wires. The wires in figure 25(a) are marked with phases for a three-phase clock. Note that the phases alternate between the upper clock wire and the lower clock wires so that both directions of current flow can be utilized.

An alternative clocking scheme for NML involves only two clock phases. Figure 25(b) shows the structure and fields for such an approach. Note that other levels of metals are required for clock returns since phase 0 can only be on the bottom level and phase 1 only on the top level. Thus, two-level, two-phase clocking resolves only one of the two issues with planar clocking.



**Figure 28.** (a) Cross section of MEI-1. Left: driver nanomagnet. Right: MTJ stack. The distance between the driver and the MTJ is 10 nm. Electrodes and pinning layers are not shown in the figure; (b) cross section view of MEI-2: (i) driver nanomagnet. (ii), (iii), (iv) MTJ stack with reversed layer order, and top three layers (SAF) are on the left, middle and right side of the free layer respectively. The distance between driver nanomagnet (i), and MTJ stack (ii), (iii), or (iv) is 10 nm. Electrodes and pinning layers are not shown in the figure.

Figure 26(a) shows how four levels of clocking might be constructed to realize a three-phase clock. Note that it is possible to place ensembles of NML devices between each level of clocking if the cladding on the intermediary clock lines is removed from both the top and the bottom of the wire. The down side of only side shielding is that the magnetic fields generated at the nanomagnets would be reduced given the reduced focus of the magnetic fields. Consequently, higher currents would be required. An advantage of four levels of clocking is the ability to smooth the total current used for clocking. Figure 26(b) shows four levels of two-phase clocking. The return current from one level of clocking is used to clock another level of NML. In this case, the middle NML level experiences clocking with reversing fields. Since in two-level clocking the clocks do not overlap, this is not a problem.

To summarize, distributing clock lines on multiple layers could eliminate irregularities in field distributions at the clock wire boundaries, and decrease the magnetic fields required to move a signal in an NML ensemble from one group to another. (As reported in [64], there can be a field minimum if two parallel lines are excited; with a multilayer approach, while there will always be some variation in field, this undesirable field drop could be significantly reduced.)

The multi-plane wire structures that would be required have already been fabricated for use in field MRAM [62]. Again, to date, the only thing restricting the distribution of clocking wires in multiple planes is the characterization of circuit output—in that a second plane would not allow devices that comprise an NML structure to be interrogated with MFM. However, this need can be eliminated as electrical output [22] is developed (see section 7).

### 6.5. Clock driver and timing

A clock driver based on the write driver discussed by [70, 71] is shown in figure 27(a). The driver basically acts as a switched current mirror with  $V_{ClockRef}$  being the current mirror reference voltage. The capacitor,  $cap$ , is charged to the reference voltage during the time between pulses. During a pulse, the capacitor is isolated from the reference voltage and connected to the driver transistor,  $m0$ , gate to control the gate voltage. The isolation prevents voltage spikes from clock currents passing through parasitic resistance from affecting the gate-to-source voltage of transistor  $m0$  and thus the transistor current. The resistance of  $gate0$  controls the current

rise time,  $t_{RISE}$  while the resistance of  $m2$  as controlled by  $V_{OffReff}$  controls the current fall time,  $t_{FALL}$ .

Figure 27(b) shows the details of the three-phase clock waveform. There are four key parameters in the waveform, the rise time,  $t_{RISE}$ , the high hold time for two clocks on  $t_{Hold}$ , the fall time,  $t_{FALL}$ , and the low hold time for a single clock on,  $t_{Lhold}$ . If the rise time is too fast, the nanomagnets will oscillate in response to the fast changing field in the clock wire. The high hold time needs to be long enough for the null state to stabilize in the newly nulled clock zone. The fall time needs to be slow enough for the logic state to properly propagate in the clock zone being released from the null state. Too fast a fall may also allow nanomagnets to return to a rest state without being influenced by the propagating logic state. Finally the low hold state needs to be held long enough for the newly set nanomagnets to stabilize in their state before it can propagate to the next zone. Note that the clock phases overlap to control the boundary condition at the ends of the clock zones. Because of the overlap, the clock fields need to be in the same direction.

Figure 27(c) shows a two-phase clock waveform. The rise, fall, and hold times have similar requirements to the three-phase clock above. Note, however, that the two clock phases do not overlap. Consequently, there is no need that the two clocks have fields in the same direction, enabling the middle level of nanomagnets shown in figure 26(b).<sup>21</sup>

### 6.6. Alternative approaches

Other means for clocking NML circuits are also being explored, and nearly all of the work proposed in sections 2 and 4 will be compatible with these methods. For example, multiferroic materials may provide a pathway to control magnetism with electric fields. Multiferroics are defined as materials that exhibit more than one order parameter [72–76]—i.e. ferroelectricity (FE) coupled with some form of magnetism leads to a magnetoelectric susceptibility. Among the known multiferroics, bismuth ferrite,  $BiFeO_3$  (BFO) has ferroelectric and antiferromagnetic transition temperatures ( $820\text{ }^\circ C$  and  $370\text{ }^\circ C$  respectively) that are much higher than room temperature [77]. Reference [78] has investigated the manipulation of nanomagnetic islands with multiferroic materials—which have been leveraged to demonstrate the switching of an in-plane ferromagnetic

<sup>21</sup> As discussed in [11], magnet shape could define dataflow directionality with a two-phase clock.



**Figure 29.** (a) Cross section of MEI-LM. MTJs are fabricated close enough for NML operations. Electrodes and pinning layers are not shown in the figure; (b) Cross section of MEI-SS. Top: free layer nanomagnets with top electrodes. Bottom: shared SAF with barrier (MgO) layer, pinning layer, and bottom electrode.

component [78]. This suggests the potential for electric field control of magnetism in the ferromagnetic layer.

Additionally, [79] considers how the magnetization state of a multiferroic nanomagnet can be rotated via the coupling between a magnetostriuctive layer and a piezoelectric layer. Reference [79] suggests that only 10s of milli-volts are needed to induce a rotation of  $90^\circ$ —which could allow for clocking as described in section 2. Notably, [79] suggests that for clock rates of 1 GHz,<sup>22</sup> stress-based clocking would lead to clocked energy dissipation of just  $\sim 200$  kT per device. By comparison, STT-based switching would result in an energy dissipation of  $\sim 10^8$  kT per device. Note that both strain-based clocking and STT-based clocking would require that every NML device be contacted individually—which would increase fabrication complexity. (This is avoided with a line clock approach.) Finally, domain walls [80] might represent another approach to clocking.

## 7. Output

In order to sense the magnetization state of an NML device such that information encoded in it can be used by transistor-based circuitry, a magnetic–electrical interface (MEI) is needed. The MEI designs discussed here use fringing fields from a nanomagnet to influence the magnetization state of the free layer of a layered structure such as a magnetic tunnel junction (MTJ) or spin valve. The same magnetic ‘clocking field’ that is used to facilitate re-evaluation of NML ensembles can also be used to allow fringing fields from a nanomagnet to influence and set the state of the free layer of an MTJ. More specifically, the clocking field will help to magnetize the free layer of the output device along its hard axis—similar to the functionality provided by a digit line in field-based MRAM [81].

For all of the device simulation discussed in this section, the material parameters used for the free layers are  $M_s = 800\,000 \text{ A m}^{-1}$ ,  $\alpha = 0.1$ , and exchange energy =  $1.05 \times 10^{-11} \text{ J m}^{-1}$  [82]. Layers have a  $60 \times 90 \text{ nm}^2$  footprint.

### 7.1. MEI-1 and MEI-2

Work presented in [22] introduced two potential MEI designs (referred to as MEI-1 and MEI-2):

<sup>22</sup>This work assumes that magnets are clocked individually.

- MEI-1 assumes that the layers of the output structure are placed such that the pinned layers are on the bottom of the stack, and the free layer is on the top of stack (see figure 28(a)). This is similar to an ordering used in traditional MRAM arrays [81]. Micromagnetic simulations suggest that fringing fields from a neighboring nanomagnet could in fact set the magnetization state of a clocked free layer.
- In another design (MEI-2, see figure 28(b)), the free layer of the MTJ resides on the bottom—with oxide tunneling layers, fixed layers, etc placed on top. As discussed below, this design should help to ease the more stringent fabrication requirements of MEI-1.

**7.1.1. MEI-1.** MEI-1 shares the same MTJ layer order employed in traditional MRAM [81]. The adjacent driver magnet is as thick as the entire stack. Figure 28(a) shows a driver magnet (at left)—which micromagnetic simulations suggest can produce fringing fields of sufficient magnitude to set the state of the layered structure (at right) given a 20 mT clocking field. Both possible input magnetization states of the driver magnet were considered, and both parallel and anti-parallel orientations (between the free layer and fixed layer of the MTJ) can be achieved. Note that in the aforementioned simulations, the driver’s state was actually set via data propagation through an AF-ordered line. A helper island like that shown in figure 4 was also placed next to the layered structure. This design would obviously require an alignment step, which would make it challenging to fabricate. Again, while simulations show that this design can function as intended, the more stringent fabrication requirements associated with it led to the evolution of MEI-2.

**7.1.2. MEI-2.** MEI-2 reverses the layer order in the MTJ stack (i.e. as in [83]). In MEI-2, the free layer resides on the bottom—and the oxide tunneling layer, fixed layer, etc are placed on top as shown in figure 28(b). Furthermore, with MEI-2, the driver nanomagnet has the same thickness as the free layer of the output MTJ. More importantly, the top three layers that form a synthetic antiferromagnet (SAF) can be shrunk laterally to about one half the size of the free layer. This should help to ease fabrication requirements. More specifically, the SAF is etched over some portion of the MTJ stack. Precisely which part is etched off (see figure 28(b)) can affect the external energy required assuming identical



**Figure 30.** (a) Cross section of spin polarized current as input into NML circuitry. Pinning layer and electrodes are not shown in the figure; (b) (i) a schematic of a single clock line structure with nanomagnets on top and bias wire across the input nanomagnets and (ii) Maxwell simulation model for bias wire. In the figure, the metal line is shown raised to accurately reflect the bias metal deposition over the magnet; (c) effect of variation of bias wire thickness; (d) effect of variation of bias wire width; (e) fabricated biasing wire one end of an L shaped nanomagnet wire.

fixed and pinned layers as shown in figure 28(b).<sup>23</sup> That said, MEI-2 is more practical from the standpoint of the lithographic process required to manufacture the ensemble. While an alignment step is still required, a single mask can be used to define the NML devices, and the free layer of the output structure. Additionally, micromagnetic simulations suggest that some misalignment is tolerable when defining the remainder of the stack to form the output device (i.e. the pinned and free layers).

## 7.2. Alternative MEI designs

In addition to the two MEI designs from [22] that were discussed above, two other design alternatives are being considered. In one approach (MEI-LM), ensembles of MTJs are placed sufficiently close such that their free layers would interact via fringing field coupling (like the NML ensembles discussed in previous sections). With MEI-LM (see figure 29(a)), only stacks that are meant to serve as output devices would be contacted. To ensure proper circuit functionality, it will be very important to minimize flux leakage from a given stack, to avoid a static biasing field on each device that is not based on a neighboring device's fringing fields. However, increasing free layer thickness and/or the saturation magnetization of the bottom pinned layers represent design levers to minimize this unwanted effect.

<sup>23</sup>In other words, higher clock fields may be required so that a driver magnet can deterministically set the state of the free layer of the output device. However, additional micromagnetic simulation studies of the MEI-2 design suggest that the initial design for MEI-2 may have been overly pessimistic—i.e. the saturation field used was  $\sim 5\times$  lower than experimental data suggests. As such, for the design illustrated in figure 28(b), free layer thickness and device-to-device spacing could be increased to 6 nm and 14 nm respectively.

In another approach (MEI-SS) the magnetic material that would form the NML devices is deposited on a shared SAF structure. With MEI-SS the nanomagnets would be placed on top of a relatively large, shared SAF. MEI-SS is shown in cross section in figure 29(b). MTJs that leverage this shared SAF approach have been experimentally demonstrated [84] and could mitigate flux leakage. From the standpoint of fabrication, this design is fabrication friendly and is currently being used to develop experimental prototypes. As larger circuits are considered, the size and placement of shared SAF tiles will need to be more carefully considered (i.e. we will need to ensure that the magnetic layers in the SAF do not shield the nanomagnets from the clock field).

In summary, MEI-LM is quite compatible with traditional MRAM fabrication processes. However, because of the potential increase in the free layer thickness, the etching aspect ratio of this design becomes challenging ( $>5:1$ ). Alternatively, the etch depth associated with MEI-SS would be significantly reduced. That said, another challenge to this design is that the shared SAF structure leads to a shared bottom electrode. As such, if multiple magnetic islands on the SAF are to be contacted for use as output devices, reads would need to proceed sequentially.

## 8. Input

### 8.1. MTJ or spin valve structure

Spin polarized current could be used to set the state of an input to an NML ensemble. The spin can be generated either from (i) a pinned ferromagnetic layer in a MTJ structure, or (ii) a relatively large fixed magnet [85]. As a preliminary attempt, we consider the former via OOMMF simulation (see schematic in figure 30(a)). Current flows through the pinned layer, spin is carried by the current from the pinned layer to the



**Figure 31.** Example of a simple systolic processing element with corresponding NML schematic.

free layer, and spin is strong enough to set the magnetization of the free layer. Fringing fields from the MTJ free layer could then serve as an input to an adjacent NML device.

The simulation efforts discussed above represent very preliminary work, and additional study is needed. Namely, for the free layer to properly serve as an input, its magnetization state must remain stable while driving an adjacent magnet. For the layered structure studied here, there are only two ferromagnetic layers: the pinned layer and the free layer. The fringing field from the pinned layer can impact the stability of the free layer. This is true, especially when the magnetizations of the free layer and the pinned layer are in a parallel position.

## 8.2. Biasing line

While nulling nanomagnets is an essential step for NML operation, it could also help to facilitate low-power control of input nanomagnets. A nulled nanomagnet, requires only a small bias to induce deterministic, easy axis switching. If that bias can be deterministically produced and controlled, we can directly control the magnetization of certain nanomagnets at the *inputs* to an ensemble. Here, we highlight experimental design efforts of a current-carrying biasing wire for use as an input.

Figure 30(b)-(i) shows a schematic describing the relative position of a clock line, nanomagnet and biasing wire. Nanomagnets are placed on top of a clad copper clocking line, which biases devices along their hard axes when a current pulse is passed through it. The bias wire is built across the input nanomagnet and produces a bias field when a current pulse is applied. The direction of bias field can be controlled by controlling the direction of current flowing through the bias wire. Current pulses through the clock line and bias wire are synchronized so that input nanomagnets are still under the bias field for a short duration after the nulling field is removed, as shown in figure 30(b)-(ii).

This approach is advantageous in terms of power required for switching. Since the input nanomagnets are hard axis aligned along with all other nanomagnets, the only additional energy required to set the input magnetization is the one required to produce the bias field. The magnetic field produced by the bias wire is also strongly coupled to the input nanomagnet because of their close proximity. It can also serve as an interface between an electronic system (e.g. CMOS system) and a magnetic system.

Initial prototyping efforts are underway. In an effort to achieve maximum bias field for a given current, the bias wire geometry was analyzed using Maxwell finite element simulations. Figure 30(b)-(ii) shows the structure to describe

a bias wire. The wire is defined as a copper line with a fixed length of  $20 \mu\text{m}$ . The input nanomagnet is replaced by a vacuum to calculate the magnetic field produced by bias wire only. In figure 30(c), the current through the bias wire was kept constant at 1 mA. Then, for a fixed width of 180 nm, the thickness of the bias wire was varied from 50 to 100 nm. Here, the bias field increases with decreasing thickness as the current density increases. Figure 30(d) shows the simulation results describing a case where the bias current and bias wire thickness were kept at 1 mA and 100 nm respectively, while the bias wire width was varied from 100 to 180 nm. The bias field increases with decreasing width, but the uniformity of field along the length decreases.

The relative ease of integration of a biasing wire given the fabrication process envisioned for the clock lines discussed in section 6 could help to mitigate otherwise complex fabrication issues. The clad clock lines are fabricated using a Damascene process that ensures a flat surface for both the nanomagnets and a biasing line. Further planarization is carried out by spinning a thin ( $\sim 100$  nm) layer of HSQ. This layer further smooths a clock wire's surface and also creates an insulating layer between the clock line and bias wire. The smooth surface should help to ensure a well-formed biasing line with small (i.e.  $< 100$  nm) dimensions, and the insulating HSQ layer ensures that there is no electrical contact between clock wire and bias wire. In the fabrication process, the clock line is fabricated first, and is followed by the placement of nanomagnets on top of clock line. The bias wire is placed directly on top of the input nanomagnet, thus making fabrication of the bias wire the last step of this process. Hence this is a back end process that can be easily integrated with clock line and nanomagnet fabrication. Figure 30(e) shows a fabricated biasing wire on top of an input nanomagnet, though it is not fabricated on a clock wire.

Moving forward, challenges to this approach will involve ensuring that a biasing line can be placed over a nanomagnet with sufficient precision. Determining (i) whether or not the field from the biasing line is of sufficient magnitude to reliably set the state of a device at the input and (ii) how long the biasing line must be excited for—to ensure that the input is deterministically set—must also be studied.

## 9. Uses: architectures and intangibles

The structures discussed above represent all of the components required to ultimately form more complex combinatorial logic and information processing systems. In addition to general Boolean logic circuits, the nearest neighbor interactions and inherently pipelined logic associated with

NML map extremely well to systolic architectures developed in the late 1970s and early 1980s. In a systolic architecture, data will flow from a computer's memory, through many (and often identical) processing elements, before returning to memory. Additional processing is done on some subset of data at each element.

As one example, consider the convolution problem where, given a sequence of weights  $w_1, w_2, \dots, w_k$  and the input sequence  $x_1, x_2, \dots, x_k$ , the resulting sequence  $y_i = w_1x_i + w_2x_{i+1} + \dots + w_kx_{i+k-1}$  is calculated. (Also, if the multiplication and addition operations in the convolution problem discussed above are transformed to comparison and Boolean AND operations, the convolution problem is transformed to the pattern matching problem.) What an NML-based circuit that performs a 1-bit convolution might look like appears in figure 31. Streams of data would flow from left-to-right while a cumulative output would flow from right to left. Bi-directional dataflow in lines of magnets is possible if the clock simply puts a group of magnets into a metastable state; on which end of a line an input is set determines dataflow direction.

Design and simulation efforts of systolic convolution PE are extensible. Systolic solutions exist for other problems including filtering, polynomial evaluation, discrete Fourier transforms, matrix arithmetic and non-numeric applications involving graphing algorithms and data structures. As larger designs are considered, special attention must be given to efficient routing per the discussion in section 4.3.

## 10. Wrap up

To conclude, we have presented the experimental state-of-the-art for NML, and highlighted progress both progress and important action items for structures needed for complete systems. Numerous design choices that could help deliver the promise of low energy systems have been highlighted as areas for future research. While any emerging technology faces significant challenges when being considered as a replacement to CMOS (even for limited application spaces), at present, a solid foundation exists from which to begin more detailed analyses and construction of NML circuits and systems.

## References

- [1] Imre A, Csaba G, Ji L, Orlov A, Bernstein G H and Porod W 2006 Majority logic gate for magnetic quantum-dot cellular automata *Science* **311** 205–8
- [2] Waser R 2003 Nanoelectronics and information technology *Advanced Electronic Materials and Novel Devices* (Weinheim: Wiley–VCH)
- [3] Cowburn R P and Welland M E 2000 Room temperature magnetic quantum cellular automata *Science* **287** 1466–8
- [4] Bernstein G H, Imre A, Metlushko V, Orlov A, Zhou L, Ji L, Csaba G and Porod W 2005 Magnetic QCA systems *Microelectron. J.* **36** 619–24
- [5] Imre A 2005 Experimental study of nanomagnets for magnetic quantum-dot cellular automata (MQCA) logic applications *PhD Electrical Engineering*, University of Notre Dame, Notre Dame
- [6] Varga E, Niemier M T, Bernstein G H, Porod W and Hu X S 2009 Non-volatile and reprogrammable MQCA-based majority gates *Device Research Conference (June)* pp 1–2
- [7] Stoner E C and Wohlfarth E P 1948 A mechanism of magnetic hysteresis in heterogeneous alloys *Phil. Trans. R. Soc. A* **240** 599–642
- [8] Niemier M, Crocker M and Hu X S 2008 Fabrication variations and defect tolerance for nanomagnet-based QCA *Proc. 23rd IEEE Int. Symp. on Defect and Fault-Tolerance in VLSI Systems* pp 534–42
- [9] Donahue M 1999 OOMMF User's Guide Version 1.0, *Interagency Report NISTIR 6367*
- [10] Lemecke O 2004 Implementation of temperature in micromagnetic simulations July 11, 2010, available at [www.nanoscience.de/group\\_r/stm-spstm/projects/temperature/download.shtml](http://www.nanoscience.de/group_r/stm-spstm/projects/temperature/download.shtml)
- [11] Dingler A, Niemier M, Hu X S, Garrison M and Alam M T 2009 System-level energy and performance projections for nanomagnet-based logic *IEEE/ACM Int. Symp. on Nanoscale Architectures* pp 21–6
- [12] Varga E, Orlov A, Niemier M T, Hu X S, Bernstein G H and Porod W 2010 Experimental demonstration of fanout for nanomagnetic logic *IEEE Trans. Nanotechnol.* **9** 668–70
- [13] Varga E, Liu S, Niemier M T, Porod W, Hu X S, Bernstein G H and Orlov A 2010 Experimental demonstration of fanout for nanomagnet logic *Device Research Conference (Notre Dame, IN)* pp 95–6
- [14] Bergman B, Moriya R, Hayashi M, Thomas L, Tyberg C, Lu Y, Joseph E, Rothwell M-B, Hummel J, Gallagher W J, Koopmans B and Parkin S S P 2009 Generation of local magnetic fields at megahertz rates for the study of domain wall propagation in magnetic nanowires *Appl. Phys. Lett.* **95** 262503
- [15] Cowburn R P and Allwood D 2006 Memory access *US Patent Specification* 7,554,835 B2
- [16] Alam M T, Kurtz S, Siddiq M J, Niemier M T, Bernstein G H, Hu X S and Porod W 2011 On-chip clocking of nanomagnet logic lines and gates *IEEE Trans. Nanotechnol. PP* 1–1
- [17] Matsunaga S, Hayakawa J, Ikeda S, Miura K, Hasegawa H, Endoh T, Ohno H and Hanyu T 2008 Fabrication of a nonvolatile full adder based on logic-in-memory architecture using magnetic tunnel junctions *Appl. Phys. Express* **1** 091301
- [18] Allam M W and Elmasry M I 2001 Dynamic current mode logic (DyCML): a new low-power high-performance logic style *IEEE J. Solid-State Circuits* **36** 550–8
- [19] Allwood D A, Xiong G, Faulkner C C, Atkinson D, Petit D and Cowburn R P 2005 Magnetic domain-wall logic *Science* **309** 1688–92
- [20] Lyle A, Klemm A, Harms J, Zhang Y, Zhao H and Wang J-P 2011 *Probing Dipole Coupled Nanomagnets Using Magnetoresistance Read* vol 98 (New York: AIP)
- [21] Becherer M, Kiermaier J, Breitkreutz S, Csaba G, Ju X, Rezgani J, Kieling T, Yilmaz C, Osswald P and Lugli P 2010 On-chip extraordinary Hall-effect sensors for characterization of nanomagnetic logic devices *Solid State Electron.* **54** 1027–32
- [22] Liu S, Hu X S, Nahas J J, Niemier M T, Porod W and Bernstein G H 2011 Magnetic–electrical interface for nanomagnet logic *IEEE Trans. Nanotechnol.* **10** 757–63
- [23] Csaba G, Lugli P, Becherer M, Schmitt-Landsiedel D and Porod W 2008 Field-coupled computing in magnetic multilayers *J. Comput. Electron.* **7** 454–7
- [24] Becherer M, Csaba G, Emeling R, Porod W, Lugli P and Schmitt-Landsiedel D 2009 Field-coupled nanomagnets for interconnect-free nonvolatile computing *ISSCC: Int. Solid-State Circuits Conference* pp 474–5

- [25] Becherer M, Csaba G, Porod W, Emling R, Lugli P and Schmitt-Landsiedel D 2008 Magnetic ordering of focused-ion-beam structured cobalt–platinum dots for field-coupled computing *IEEE Trans. Nanotechnol.* **7** 316–20
- [26] Breitkreutz S, Kiermaier J, Ju X, Csaba G, Schmitt-Landsiedel D and Becherer M 2011 Nanomagnetic logic: demonstration of directed signal flow for field-coupled computing devices *ESSDERC Helsinki, Finland* at press
- [27] Wolf S A, Jiwei L, Stan M R, Chen E and Treger D M 2010 The promise of nanomagnetics and spintronics for future logic and universal memory *Proc. IEEE* **98** 2155–68
- [28] Lent C S and Tougaw P D 1997 A device architecture for computing with quantum dots *Proc. IEEE* **84** 541–57
- [29] Lent C S, Tougaw P D, Porod W and Bernstein G H 1993 Quantum cellular automata *Nanotechnology* **4** 49–57
- [30] Behin-Aein B, Datta D, Salahuddin S and Datta S 2010 Proposal for an all-spin logic device with built-in memory *Nature Nanotechnol.* **5** 266–70
- [31] Khiton A and Wang K L 2005 Nano scale computational architectures with spin wave bus *Superlatt. Microstruct.* **38** 184–200
- [32] Khiton A, Mingqiang B and Wang K L 2008 Spin wave magnetic nanofabric: a new approach to spin-based logic circuitry *IEEE Trans. Magn.* **44** 2141–52
- [33] Wu T, Bur A, Wong K, Zhao P, Lynch C S, Amiri P K, Wang K L and Carman G P 2011 *Electrical Control of Reversible and Permanent Magnetization Reorientation for Magnetoelectric Memory Devices* vol 98 (New York: AIP)
- [34] Niemier M T, Hu X S, Alam M, Bernstein G, Porod W, Putney M and DeAngelis J 2007 Clocking structures and power analysis for nanomagnet-based logic devices *ISLPED'07: Proc. 2007 Int. Symp. on Low Power Electronics and Design* pp 26–31
- [35] Ramesh R and Spaldin N A 2007 Multiferroics: progress and prospects in thin films *Nature Mater.* **6** 21–9
- [36] Pietambaram S V, Rizzo N D, Dave R W, Goggin J, Smith K, Slaughter J M and Tehrani S 2007 Low-power switching in magnetoresistive random access memory bits using enhanced permeability dielectric films *Appl. Phys. Lett.* **90** 143510
- [37] Donahue M *Oxs Extension Modules*, available at <http://math.nist.gov/oommf/contrib/oxsext/>
- [38] Niemier M, Varga E, Bernstein G, Porod W, Alam M, Dingler A, Orlov A and Hu X 2010 Shape engineering for controlled switching with nanomagnet logic *IEEE Trans. Nanotechnol.* **PP** 1–1
- [39] Imre A, Csaba G, Bernstein G H, Porod W and Metlushko V 2003 Investigation of shape-dependent switching of coupled nanomagnets *Superlatt. Microstruct.* **34** 513–8
- [40] Koop H, Brickl H, Meyners D and Reiss G 2004 Shape dependence of the magnetization reversal in sub-[μm] magnetic tunnel junctions *J. Magn. Magn. Mater.* **272–276** E1475–6
- [41] Bode M, Pietzsch O, Kubetzka A and Wiesendanger R 2004 Shape-dependent thermal switching behavior of superparamagnetic nanoislands *Phys. Rev. Lett.* **92** 067201
- [42] Mani A S, Geerpuram D, Baskaran V S and Metlushko V 2006 Effect of controlled asymmetry on the switching characteristics of ring-based MRAM design *IEEE Trans. Nanotechnol.* **5** 249–54
- [43] Carlton D B, Emley N C, Tuchfeld E and Bokor J 2008 Simulation studies of nanomagnet-based logic architecture *Nano Lett.* **8** 4173–8
- [44] Kurtz S, Varga E, Niemier M, Porod W, Bernstein G H and Hu X S 2011 Two input, non-majority magnetic logic gates: experimental demonstration and future prospects *J. Phys.: Condens. Matter* **23** 053202
- [45] Niemier M, Crocker M, Hu X S and Lieberman M 2006 Using CAD to shape experiments in molecular QCA *IEEE/ACM Int. Conf. Computer Aided Digest of Technical Papers* vols 1 and 2, pp 161–8
- [46] Jiao J, Long G J, Grandjean F, Beatty A M and Fehlner T P 2003 Building blocks for the molecular expression of quantum cellular automata: isolation and characterization of a covalently bonded square array of two ferrocenium and two ferrocene complexes *J. Am. Chem. Soc.* **125** 7522–3
- [47] Qi H, Sharma S, Li Z, Snider G L, Orlov A O, Lent C S and Fehlner T P 2003 Molecular quantum cellular automata cells: electric field driven switching of a silicon surface bound array of vertically oriented two-dot molecular quantum cellular automata *J. Am. Chem. Soc.* **125** 15250–9
- [48] Niemier M, Hu X S, Dingler A, Alam M T, Bernstein G and Porod W 2008 Bridging the gap between nanomagnetic devices and circuits *IEEE Int. Conf. Comput. Des.* 506–13
- [49] Pulecio J F and Bhanja S 2010 Magnetic cellular automata coplanar cross wire systems *J. Appl. Phys.* **107** 034308
- [50] Opila R L, Ali I, Arimoto Y A, Homma Y, Reidsema-Simpson C and Sundaram K B 2000 Chemical-mechanical polishing for shallow trench isolation: a new interpretation *Chemical Mechanical Planarization in IC Device Manufacturing* vol III, pp 3–10
- [51] Spedalieri F M, Jacob A P, Nikonorov D E and Roychowdhury V P 2010 Performance of magnetic quantum cellular automata and limitations due to thermal noise *IEEE Trans. Nanotechnol.* **10** 537–46
- [52] Gross L *et al* 2010 Magnetologic devices fabricated by nanostencil lithography *Nanotechnology* **21** 325301
- [53] Niemier M T and Kogge P M 2001 Exploring and exploiting wire-level pipelining in emerging technologies *Proc. 28th Annu. Int. Symp. on Computer Architecture* pp 166–77
- [54] Bandyopadhyay S and Cahay M 2009 Electron spin for classical information processing: a brief survey of spin-based logic devices, gates and circuits *Nanotechnology* **20** 412001
- [55] Augustine C, Behin-Aein B and Roy K 2009 Nano-magnet based ultra-low power logic design using non-majority gates *9th IEEE Conf. on Nanotechnology, 2009. IEEE-NANO 2009* pp 870–3
- [56] Behin-Aein B, Salahuddin S and Datta S 2009 Switching energy of ferromagnetic logic bits *IEEE Trans. Nanotechnol.* **8** 505–14
- [57] Csaba G, Lugli P and Porod W 2004 Power dissipation in nanomagnetic logic devices *4th IEEE Conf. Nanotechnology* pp 346–8
- [58] Skumryev V, Stoyanov S, Zhang Y, Hadjipanayis G, Givord D and Nogues J 2003 Beating the superparamagnetic limit with exchange bias *Nature* **423** 850–3
- [59] Csaba G and Porod W 2010 Behavior of nanomagnet logic in the presence of thermal noise *IWCE: 14th Int. Workshop on Computational Electronics, 2010* pp 1–4
- [60] Bernstein K, Cavin R K, Porod W, Seabaugh A and Welser J 2010 Device and architecture outlook for beyond CMOS switches *Proc. IEEE* **98** 2169–84
- [61] Alam M, Bernstein G H, Bokor J, Carlton D, Hu X S, Kurtz S, Lambson B, Niemier M T, Porod W, Siddiq M and Varga E 2010 Experimental progress of and prospects for nanomagnet logic (NML) *Silicon Nanoelectronics Workshop (SNW), 2010* pp 1–2
- [62] Ohshima N, Shimura K-I, Miura S, Suzuki T, Nebashi R and Hada H 2008 Magnetic properties and writing characteristics of magnetic clad lines in magnetoresistive random access memory devices *Japan. J. Appl. Phys.* **47** 3456

- [63] Freescale, available at [www.freescale.com/files/microcontrollers/doc/data\\_sheet/MR0A16A.pdf](http://www.freescale.com/files/microcontrollers/doc/data_sheet/MR0A16A.pdf)
- [64] Dingler A, Siddiq M J, Niemier M, Hu X S, Alam M T, Bernstein G and Porod W 2009 Controlling magnetic circuits: how clock structure implementation will impact logical correctness and power *IEEE Int. Symp. on Defect and Fault Tolerance VLSI Systems (Chicago, IL)* pp 94–102
- [65] Alam M T, Siddiq M J, Bernstein G H, Niemier M, Porod W and Hu X S 2010 On-chip clocking for nanomagnet logic devices *IEEE Trans. Nanotechnol.* **9** 348–51
- [66] Yu-Hsin S, Jyh-Shinn Y and Ching-Ray C 2009 Schwarz Christoffel transformation for cladding conducting lines *IEEE Trans. Magn.* **45** 3800–3
- [67] Insilicio, 2011 *SpinFlow 3D* available at [www.insilicio.fr/](http://www.insilicio.fr/)
- [68] Durlam M *et al* 2002 A low power 1 Mbit MRAM based on 1T1MTJ bit cell integrated with copper interconnects *Symp. on VLSI Circuits Digest of Technical Papers, 2002* pp 158–61
- [69] Durlam M *et al* 2003 A 1-Mbit MRAM based on 1T1MTJ bit cell integrated with copper interconnects *IEEE J. Solid-State Circuits* **38** 769–73
- [70] Nahas J J, Andre T W, Garni B, Subramanian C, Lin H, Alam S M, Papworth K and Martino W L 2008 A 180 Kbit embeddable MRAM memory module *IEEE J. Solid-State Circuits* **43** 1826–34
- [71] Nahas J J, Andre T, Subramanian C, Lin H, Alam S M, Papworth K and Martino W 2007 A 180 Kbit embeddable MRAM memory module *CICC'07: IEEE Custom Integrated Circuits Conference, 2007* pp 791–4
- [72] Cheong S W and Mostovoy M 2007 Multiferroics: a magnetic twist for ferroelectricity *Nature Mater.* **6** 20
- [73] Fiebig M 2005 Revival of the magnetoelectric effect *J. Phys. D: Appl. Phys.* **38** R152
- [74] Ramesh R and Spaldin N 2007 Multiferroics: progress and prospects in thin films *Nature Mater.* **6** 29
- [75] Schmid H 1994 Multi-ferroic magnetoelectrics *Ferroelectrics* **162** 388
- [76] Smolenski G A and Chupis I E 1982 Ferroelectromagnets *Sov. Phys.—Usp.* **25** 493
- [77] Wang J *et al* 2003 Epitaxial BiFeO<sub>3</sub> multiferroic thin film heterostructures *Science* **299** 1722
- [78] Chu Y H *et al* 2008 Electric-field control of local ferromagnetism using a magnetoelectric multiferroic *Nature Mater.* **7** 478–82
- [79] Salehi F M, Roy K, Atulasimha J and Bandyopadhyay S 2011 Magnetization dynamics, Bennett clocking and associated energy dissipation in multiferroic logic *Nanotechnology* **22** 155201
- [80] Fukami S *et al* 2009 Low-current perpendicular domain wall motion cell for scalable high-speed MRAM *Symp. on VLSI Technology, 2009* pp 230–1
- [81] Engel B N, Rizzo N D, Janesky J, Slaughter J M, Dave R, DeHerrera M, Durlam M and Tehrani S 2002 The science and technology of magnetoresistive tunneling memory *IEEE Trans. Nanotechnol.* **1** 32–8
- [82] Dao N, Whittenburg S L and Cowburn R P 2001 Micromagnetics simulation of deep-submicron supermalloy disks *J. Appl. Phys.* **90** 5235–7
- [83] Honjo H, Fukami S, Nebashi R, Suzuki T, Ishiwata N, Miura S and Sugabayashi T 2009 Performance of shape-varying magnetic tunneling junction for high-speed magnetic random access memory cells *J. Appl. Phys.* **105** 07C921
- [84] Sankey J C, Cui Y-T, Sun J Z, Slonczewski J C, Buhrman R A and Ralph D C 2008 Measurement of the spin-transfer-torque vector in magnetic tunnel junctions *Nature Phys.* **4** 67–71
- [85] Ralph D C and Stiles M D 2008 Spin transfer torques *J. Magn. Magn. Mater.* **320** 1190–216